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DETERMINING THE FUNCTIONS AND INTERACTIONS OF 
PROTEINS BY COMPARATIVE ANALYSIS 

Related Applications 

The present application is a continuation-in-part application ("dP") of Patent 
Convention Treaty (PCT) International Application Serial No: PCT/USO0/02246, filed in the 
U.S. receiving office on January 28, 2000, and this application claims the benefit of priority 
under 35 U.S.C. § 1 19(e) of U.S: Provisional Application Nos. 60/165,124, and 60/165,086, 
both filed November 12, 1999, and U.S. Provisional Application No. 60/179,531, filed February 
1, 2000. International Application Serial No; PCT/US00/02246 claims the benefit of priority 
under 35 US.C § 1 19(e) of U.S. Provisional Application Serial No. 60/1 17,844, filed January 
29, 1999, U.S. Provisional Application Serial No. 60/1 18,206, filed February 1, 1999, U.S. 
Provisional Application Serial No. 60/126,593, filed March 26, 1999, U.S. Provisional 
Applications Serial No. 60/134,093, filed May 14, 1999, and U.S. Provisional Application 
Serial No. 60/134,092, filed May 14, 1999. Each of the aforementioned applications is 
explicitly incorporated herein by reference in their entirety and for all purposes. 

TECHNICAL FIELD 

This invention generally relates to genetics\id microbiology. The invention 
provides novel methods to identify the fiinction of and relationships between nucleic acid and 
protein sequences. The method is particularly useful for finding the identifying genes and 
polypeptides having potential therapeutic relevance in organisms, e.g., microorganisms, such 
as Mycobacterium tuberculosis. The invention also provides Mycobacterium tuberculosis 
genes and polypeptides found by these methods. These genes and polypeptides are useful as 
potential drug targets. 

BACKGROUND 

The determination of the functions of and relationships between nucleic acid 
and protein sequences has traditionally relied on either the study of homology and sequence 
identity with genes and proteins of known function or, in the absence of informative 
homology, laborious experimental work. The availability of many complete genome 
sequences has made it possible to develop new strategies for computational determination of 
protein functions. Several methods have been developed which can predict the general 
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function of proteins by analyzing their functional relationships rather than sequence 
similarity. Generally, two proteins can be considered functionally related when they form 
part of the same biochemical pathway or biological process. For example, although malate 
dehydrogenase is not homologous to pyruvate carboxylase, and the two enzymes do not 
5 catalyze the same reaction, they are functionally related because they both catalyze steps of a 
common biochemical pathway, namely the tricarboxylic acid cycle. 

New methods that can establish such functional relationships could provide 
valuable information on the functions of uncharacterized nucleic acid and protein sequences. 

The disease tuberculosis, caused Mycobacterium tuberculosis (MTB) is one 
10 of the world's leading killers. The World Health Organization estimates that 30 million deaths 
from pulmonary tuberculosis will occur during this decade. Alarming reports on the 
emergence of drug-resistant strains of this bacterium underscore the importance of the search 
for new therapeutic agents. Identifying the function of every protein produced by MTB will 
provide researchers with promising new targets for anti-tuberculosis drug design. 

SUMMARY 

The invention provides novel methods for characterizing the function of 
nucleic acids and polypeptides. The invention provides a novel method for identifying a 
nucleic acid or a polypeptide sequence that may be a target for a drug. The invention provides 
a novel method for identifying a nucleic acid or a polypeptide sequence that may be essential 
for the growth or viability of an organism. The characterization is based on use of methods of 
the invention comprising algorithms that can identify functional relationships between diverse 
sets of non-homologous nucleic acid and polypeptide sequences. Characterization of nucleic 
acid and protein sequences can be the basis for the development of compositions that can 
interact with those nucleic acids and polypeptides. For example, such characterization can 
provide a basis for screening methods. Such characterization may allow use of these 
sequences as targets for drug discovery. Discovery of such compositions can provide the 
basis for the design of novel drugs, particularly if the characterized sequences are derived 
from a pathogen. 

The invention provides a method for identifying a nucleic acid or a 
polypeptide sequence that may be a target for a drug comprising the following steps: (a) 

2 
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providing a first nucleic acid or a polypeptide sequence that is known to be a drug target; (b) 
providing at least one algorithm selected from the group consisting of a "domain fusion" 
method, a "phylogenetic profile" method and a "physiologic linkage" method, wherein the 
algorithm is capable analyzing a functional relationship between nucleic acid or polypeptide 

5 sequences; and, (c) comparing the first nucleic acid or the polypeptide drug target sequence 
to a plurality of sequences using at least one of the algorithms as set forth in step (b) to 
identify a second sequence that has a functional relationship to the first sequence, thereby 
identifying a nucleic acid or a polypeptide sequence that may be a target for a drug. 

The invention provides a method for identifying a nucleic acid or a 

10 polypeptide sequence that may be essential for the growth or viability of an organism 

comprising the following steps: (a) providing a first nucleic acid or a polypeptide sequence 
(hat is known to be essential for the growth or viability of an organism; (b) providing at least 
one algorithm capable analyzing a functional relationship between nucleic acid or 
polypeptide sequences selected from the group consisting of a "domain fusion" method, a 

15 "phylogenetic profile" method and a "physiologic linkage" method; and, (c) comparing the 
first nucleic acid or the polypeptide sequence to a plurality of sequences using at least one of 
the algorithms as set forth in step (b) to identify a second sequence that has a functional 
relationship to the first sequence, thereby identifying a nucleic acid or a polypeptide 
sequence that may be essential for the growth or viability of an organism. 

20 In one aspect of the methods of the invention, the drug is an anti-microbial 

drug. In another aspect, the first nucleic acid or a polypeptide sequence is derived from a 
pathogen. The pathogen can be a microorganism, such as Mycobacterium tuberculosis 
(MTB). 

The plurality of sequences used to identify a second sequence can comprise a 
25 database of the gene sequences of an entire genome of an organism. The plurality of 

sequences used to identify a second sequence can comprise a database of the gene sequences 
derived from a pathogen. 

In one aspect of the methods of the invention, the "phylogenetic profile" 
method algorithm comprises (a) obtaining data, comprising a list of proteins from at least two 
30 genomes; (b) comparing the list of proteins to form a protein phylogenetic profile for each 
protein, wherein the protein phylogenetic profile indicates the presence or absence of a 
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protein belonging to a particular protein family in each of the at least two genomes based on 
homology of the proteins; and (c) grouping the list of proteins based on similar profiles, 
wherein proteins with similar profiles are indicated to have a functional relationship. The 
phylogenetic profile can be in the form of a vector, matrix or phylogenetic tree. The 
5 "phylogenetic profile" method can further comprise determining the significance of 
homology between the proteins by computing a probability (p) value threshold. The 
probability can be set with respect to the value 1/NM, based on the total number of sequence 
comparisons that are to be performed, wherein N is the number of proteins in the first 
organism's genome and M in all other genomes. The presence or absence of a protein 

10 belonging to a particular protein family in each of the at least two genomes can be 

determined by calculating an evolutionary distance* The evolutionary distance can be 
calculated by: (a) aligning two sequences from the list of proteins; (b) determining an 
evolution probability process by constructing a conditional probability matrix: p(aa— >aa'), 
where aa and aa* are any amino acids, said conditional probability matrix being constructed 

15 by converting an amino acid substitution matrix from a log odds matrix to said conditional 
probability matrix; (c) accounting for an observed alignment of the constructed conditional 
probability matrix by taking the product of the conditional probabilities for each aligned pair 
during the alignment of the two sequences, represented by ^(p)^]"^ p{aa* -» aa % n ) ; and, (d) 

n 

detennining an evolutionary distance a from powers equation p'=p a (aa— >aa'), maximizing 
20 for P. The conditional probability matrix can be defined by a Markov process with 

substitution rates, over a fixed time interval. The conversion from an amino acid substitution 
matrix to a conditional probability matrix can be represented by: 

BLOSUM62ij 
Pdf -» ~P(D2 A 2 > 

where BLOSUM62 is an amino acid substitution matrix, and P(i->j) is the 
25 probability that amino acid i is replaced by amino acid /through point mutations according to 
BLOSUM62 scores. In one aspect, the Pfs are the abundances of amino acid j and are 
computed by solving a plurality of linear equations given by the normalization condition that; 

£/>„(/ ->,) = !. 
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In alternative aspects of the methods of the invention, the "physiologic 
linkage" method algorithm identifies proteins and nucleic acids that participate in a common 
functional pathway; identifies proteins and nucleic acids that participate in the synthesis of a 
5 common structural complex; and, identifies proteins and nucleic acids that participate in a 
common metabolic pathway. 

In one aspect of the invention, the "domain fiision" method algorithm 
comprises (a) aligning a first primary amino acid sequence of multiple distinct non- 
homologous polypeptides to second primary amino acid sequence of a plurality of proteins; 

10 and, (b) for any alignment found between the first primary amino acid sequences of all of 
such multiple distinct non-homologous polypeptides and at least one protein of the second 
primary amino acid sequences, outputting an indication identifying the aligned second 
primary amino acid sequence as an indication of a functional link between the aligned first 
and second polypeptide sequences. The aligning can be performed by an algorithm selected 

15 from the group consisting of a Smith- Waterman algorithm, Needleman-Wunsch algorithm, a 
BLAST algorithm, a FASTA algorithm, and a PSI-BLAST algorithm. The multiple distinct 
non-homologous polypeptides can be obtained by translating a nucleic acid sequence from a 
genome database. The plurality of proteins can have a known function. At least one of the 
multiple distinct non-homologous polypeptides can have a known function. At least one of 

20 the multiple distinct non-homologous polypeptides can have an unknown function. The 

alignment can be based on the degree of homology of the multiple distinct non-homologous 
polypeptides to the plurality of proteins. The "domain fusion" method can comprise 
determining the significance of the aligned and identified second primary amino acid 
sequence by computing a probability (p) value threshold. The probability threshold can be 

25 set with respect to the value 1/NM, based on the total number of sequence comparisons that 
are to be performed, wherein N is the number of proteins in a first organism's genome and M 
in all other genomes. The "domain fiision" method can further comprising filtering excessive 
functional links between one first primary amino acid sequence of multiple distinct non- 
homologous polypeptides and an excessive number of other distinct non-homologous 

30 polypeptides for any alignment found between the first primary amino acid sequences of the 
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distinct non-homologous polypeptides and at least one of the second primary amino acid 
sequences of the plurality of proteins. 

The invention provides a computer program product, stored on a computer- 
readable medium, for identifying a nucleic acid or a polypeptide sequence that may be a 
5 target for a drug, the computer program product comprising instructions for causing a 

computer system to be capable of: (a) inputting a first nucleic acid or a polypeptide sequence 
that is known to be a drug target; (b) accessing at least one algorithm capable analyzing a 
functional relationship between nucleic acid or polypeptide sequences selected from the 
group consisting of a "domain fusion" method, a "phylogenetic profile" method and a 

10 "physiologic linkage" method; and (c) comparing the first nucleic acid or the polypeptide 
drug target sequence to a plurality of sequences using at least one of the algorithms set forth 
in step (b) to identify a second sequence that has a functional relationship to the first 
sequence and generating an output identifying a nucleic acid or a polypeptide sequence that 
may be a target for a drug . 

1 5 The invention provides a computer program product, stored on a computer- 

readable medium, for identifying a nucleic acid or a polypeptide sequence that may be 
essential for the growth or viability of an organism, the computer program product 
comprising instructions for causing a computer system to be capable of: (a) providing a first 
nucleic acid or a polypeptide sequence that is known to be essential for the growth or 

20 viability of an organism; (b) accessing at least one algorithm capable analyzing a functional 
relationship between nucleic acid or polypeptide sequences selected from the group 
consisting of a "domain fusion" method, a "phylogenetic profile" method and a "physiologic 
linkage" method; and, (c) comparing the first nucleic acid or the polypeptide sequence to a 
plurality of sequences using at least one of the algorithms set forth in step (b) to identify a 

25 second sequence that has a functional relationship to the first sequence and generating an 
output identifying a nucleic acid or a polypeptide sequence that may be essential for the 
growth or viability of an organism. 

The invention provides a computer system, comprising: (a) a processor; and, 
a computer program product of the invention. 
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All publications, patents, patent applications, GenBank sequences and ATCC 
deposits, cited herein are hereby expressly incorporated by reference for all purposes. 

The details of one or more embodiments of the invention are set forth in the 
accompanying drawings and the description below* Other features, objects, and advantages 
5 of the invention will be apparent from the description and drawings, and from the claims. 

DESCRIPTION OF DRAWINGS 

Figure 1 is an example of functional linkages predicted between InhA (Rv 
1484) and other TB genes. 

Figure 2 is an example of predicted functional linkages between embB (Rv 
10 3795), which is a target of the drug ethambutol, and other TB genes using the phylogenetic 
profile method. 

Figure 3 is an example of predicted functional linkages between five TB genes 
having homology to penicillin binding proteins and other TB genes. 

Figure shows that gcpE (Rv 2868C) is predicted to be functional linked to cell 
15 wall metabolism. 

Figure 5 shows predicted functional linkages of htrA (Rv 1223C) with other 

TB genes. 

Like reference symbols in the various drawings indicate like elements. 

20 DETAILED DESCRIPTION 

The present invention provides novel methods for identifying the relationships 
between and the function of nucleic acid and polypeptide sequences. The methods of the 
invention identify novel genes and polypeptides on the basis of their functional linkage to 
other proteins whose biological function or processes is known or inferred by homology. 
25 The genes and polypeptides identified by the methods of the invention can be 

used in screening methods for the identification of compositions which, by binding or 
otherwise interacting with the gene or polypeptide, are capable of modifying the physiology 
and growth of an organism. The compositions identified by these screening methods are 
useful as drugs and pharmaceuticals. Thus, genes and polypeptides identified by the methods 
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of the invention, including the genes and polypeptides identified herein, can be used as 
potential drug targets. 

One aspect of the invention provides methods for identifying the function of 
genes and polypeptides from Mycobacterium tuberculosis (MTB or TB). Based on this new 

5 functional determination, these genes and polypeptides can be used to screen for 
compositions capable of modifying the physiology and growth of Mycobacterium 
tuberculosis (TB), Thus, genes and polypeptides identified by the methods of the invention, 
including the genes and polypeptides identified herein, can be used as targets in screening 
protocols and can be useful as potential drug targets. 

10 The function of the TB genes and polypeptides of the present invention were 

identified using the methods of the invention; i.e., they were identified on the basis of their 
functional linkage to other proteins whose biological function or processes were known by 
experiment or inferred by homology. TB genes and polypeptides that are functionally linked 
to genes known to be involved in pathogenesis or organisms survival are potential drug 

1 5 targets. Genes or polypeptides associated with TB pathogenesis, survival or that are 

important or unique to TB biochemical pathways arc potential drug targe TB genes and 
polypeptides that have no homologues identified in humans are potential drug targets. The 
function of many of the TB genes and polypeptides identified is based on the genes or 
polypeptides with which they are functionally linked. 

20 TB genes whose function was identified using the methods of the invention 

are effectively targeted by a drug (i.e., they can act as bona fide drug targets) provides proof 
of principle that the invention's methods for identifying functionally linked genes can 
identify TB genes and polypeptides that are drug targets. Further confirmation that the genes 
identified by the methods of the invention include bona fide drug targets can be supported by 

25 the fact that genes already known to be targets for drugs have been independently identified, 
or "re-discovered," by the invention's methods. 

The novel TB genes described herein are identified as being functionally 
related or linked to other genes, including other TB genes, such as a known TB drug target 
(e.g., InhA polypeptide, which is a target of isoniazid). These functional linkages are 

30 established using mathematical algorithms. The assignment or inference of a function to TB 
genes and polypeptides based on their linkage or relatedness to other genes and polypeptides 
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is described in U.S. provisional application serial no. 60/165,086. Potential TB drug targets 
are identified by several methods discussed herein and in further detail in U.S. provisional 
application serial no. 60/134,092. Through the use of these methods, TB genes and 
polypeptides have been identified as potential drug targets and are illustrated on Tables 1 and 
2, and Figures 1 to 5. The nucleotide and amino acid sequences of these potential drug 
targets arc illustrated on Tables 3 and 4, respectively (see below). 

The phrase "functional link," "functionally related" and grammatical 
variations thereof, when used in reference to genes or polypeptides, means that the genes or 
polypeptides are predicted to be linked or related. A particular example of functionally 
related or linked proteins is where two proteins participate in a biochemical or metabolic 
pathway (e.g., malate dehydrogenase and fumarase, which are both present in the TCA 
cycle). Thus, although functionally linked or related proteins may not have sequence 
homology to each other, they are linked by virtue of their participation in the same 
biochemical pathway. Other examples of linked or related polypeptides are where two 
polypeptides are part of a protein complex, physically interact, or act upon each another. 

The "domain fusion" or "Rosetta Stone" method searches protein sequences 
across all known genomes and identifies proteins that are separate in one organism but joined 
as intramolecular domains into one larger protein in another organism. Such proteins that are 
separate in some organisms but joined in others often carry out related or sequential functions 
and are therefore functionally linked. 

The phylogenetic profile method compares protein sequences across all 
known genomes and analyzes the pattern of inheritance of each protein across the different 
organisms. Proteins that have similar patterns of inheritance, either acquired or lost as a part 
of a group of proteins through evolution, are functionally linked. The gene proximity method 
identifies genes that remain physically close or "clustered" throughout evolution and are 
therefore functionally linked. 

A particular example of the identification of a potential TB drug target would 
be to identify a TB gene or polypeptide functionally linked to a known drug target. Anti-TB 
drugs include isoniazid, rifampicin, ethambutol, streptomycin, pyrazxinamide, and 
thiacetazone. For isoniazid, this drug is believed to act through enoyl-acyl reductase InhA, 
resulting in mycolic acid biosynthesis inhibition. Thus, TB genes or polypeptides 
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functionally linked to enoyl-acyl reductase InhA are potential drug targets; see Figure 1, 
which shows an analysis of InhA, the target for isoniazid, the most widely used anti- 
tuberculosis drug, and functional linkages to a set of genes mostly known or hypothesized to 
be involved in cell wall-related processes and lipid and polyketide metabolism. Particular 
5 examples of the identification of several TB genes and polypeptides that are functionally 
related to the target of these anti-IB drugs is shown in Figures 1 to 5. 

"Domain Fusion" or "Rosetta Stone" Method 

The "domain fusion" or "Rosetta Stone" method compares protein sequences 
across laiown nucleic acid databases (e.g., known genomes) to identify genes and proteins 

10 that are separate entities in one organism but are joined into one larger multidomain protein 
in another organism. In such cases, the two separate proteins often carry out related or 
sequential functions or form part of a larger protein complex. Therefore, the general function 
of one component (e.g., one or more of the unknown proteins) can be inferred from the 
known function of the other component In addition, merely identifying links between 

1 5 proteins using the method described herein provides valuable information (e.g., usefulness as 
a target for an antibacterial drug), regardless of whether the function of one or more of the 
proteins used to form the link(s) is known. Because the two components do not have similar 
amino acid sequence the function of one could not be inferred from the other on the basis of 
sequence similarity alone. 

20 The methods for identifying drug targets (e.g., TB drug targets) described 

herein (e.g., the "Rosetta Stone Method") are based on the idea that proteins that participate 
in a common structural complex, metabolic pathway, biological process or with closely 
related physiological functions, are functionally linked. In addition, these methods also are 
capable of identifying proteins that interact physically with one another. Functionally linked 

25 proteins in one organism can often be found fused into a single polypeptide chain in a 

different organism. Similarly, fused proteins in one organism can be found as individual 
proteins in other organisms. For example, in a first organism one might identify two un- 
linked proteins "A" and "B" with unknown function. In another organism, one may find a 
single protein "AB" with a part that resembles "A" and a part that resembles "B" Protein 

30 AB allows one to predict that "A" and "B" are functionally related. 
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The functional activity of each distinct protein in the "Rosetta Stone" method 
need not be known prior to performing the method (i.e., the function of A, B, or AB need not 
be known). Using the "Rosetta Stone" method to compare and analyze several unknown 
protein sequences can provide information regarding relationships of each protein absent 
knowledge about the functional activity of the initially analyzed proteins themselves. For 
example, the information (ie., the links) can provide information that the proteins are part of 
a common pathway, function in a related process or physically interact Such information 
need not be based on the biological function of the individual proteins. 

These methods can provide information regarding links between previously 
un-linked proteins that function, for example, in a concerted process. A marker, for example, 
for a particular disease state is identified by the presence or absence of a protein (e.g., 
Her2/neu in breast cancer detection). Links (i.e., information) identified by the method, 
which link proteins "B" and "C" to such a marker suggest that proteins "B" and "C" are 
related by function, physical interaction or part of a common biological pathway with the 
marker, Such information is useful in designing screening methods and identifying drug 
targets (e.g., TB drug targets), making diagnostics, and designing therapeutics. 

In one approach, the "Rosetta Stone" method is performed by sequence 
comparison that searches for incomplete 'triangle relationships" between, for example, three 
proteins, / e., for two proteins A' and B' that are different from one another but similar in 
sequence to another protein AB. Completing the triangle relationship provides useful 
information regarding the proteins' biological fiinction(s), functional interaction, pathway 
relationships or physical relationships with other proteins in the "triangle." 

Either nucleotide sequences or amino acid sequences can be used in the 
methods for identifying functionally related or linked genes or polypeptides. Where a 
nucleic sequence is to be used it can be first translated from a nucleic acid sequence to amino 
acid sequence. Such translation may be performed in all frames if the coding sequence is not 
known. Programs that can translate a nucleic acid sequence are known in the art. In 
addition, for simplicity, the description of this method discusses the use of a "pair" of 
proteins in the determination of a "Rosetta Stone" protein, more than 2 may be used (e.g. , 3, 
4, 5, 10, 100 or more proteins). Accordingly, one can analyze chains of linked proteins, such 
as "A" linked by a Rosetta Stone protein to "B" linked by a Rosetta Stone protein to "C", etc. 
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By this method, groups of functionally related proteins can be found and their function 
identified. 

A method can start with identifying the primary amino acid sequence for a 
plurality of proteins whose functional relationship is to be determined (e.g., protein A* and 

5 protein B*)- A number of source databases are available, as described above, that contain 
either a nucleic acid sequence and/or a deduced amino acid sequence for use with the first 
step. The plurality of sequences (the "probe sequences") are then used to search a sequence 
database, e.g.* GenBank (NCBI, NLM, NIH), PFAM (a large collection of multiple sequence 
alignments and hidden Markov models covering many common protein domains; 

10 Washington University, St. Louis MO) or ProDom (a database based on recursive PSI- 

BL AST searches and designed as a tool to help analyze domain arrangements of proteins and 
protein families, see, e.g., Corpet (1999) Nucleic Acids Res. 27:263-267), either 
simultaneously or individually. Every protein in the sequence database is examined for its 
ability to act as a "Rosetta Stone" protein a single protein containing polypeptide 

15 sequences or domains from both protein A' and protein B')* A number of different methods 
of performing such sequence searches are known in the art. Such sequence alignment 
methods include, for example, BLAST (see, e.g., Altschul (1990) J* Mol. Biol. 215: 403- 
410), BLITZ (MPsrch) (see, e.g., Brenner (1995) Trends Genet. 1 1 :330-331; and infra), and 
FASTA (see, e.g., Pearson (1988) Proc. Natl. Acad. Sci. USA 85(8):2444-2448; and infra). 

20 The probe sequence can be any length (e.g. , about 50 amino acid residues to about 1 000 
amino acid residues). 

Probe sequences (e.g., polypeptide sequences or domains) found in a single 
protein (e.g. , an "AB" multidomain protein) are defined as being "linked" by that protein. 
Where the probe sequences are used individually to search the sequence database, one can 

25 mask those segments having homology to the first probe sequence found in the proteins of 
the sequence database prior to searching with the subsequent probe sequence. In this way, 
one eliminates any potential overlapping sequences between the two or more probe 
sequences. 

The linked proteins can then be further compared for similarity with one 
30 another by amino acid sequence comparison. Where the sequences are identical or have high 
homology, such a finding can be indicative of the formation of homo-dimers, -trimers, etc. 
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Typically, "Rosetta Stone"-linked proteins are only kept when the linked proteins show no 
homology to one another (e.g., hetero-dimers, trimers, etc.). 

In another method for identifying functional linkages, a potential fusion 
protein lacking any functional information that is suspected of having two or more domains 
5 (e.g , a potential "Rosetta Stone" protein) may be used to search for related proteins. In this 
method, the primary amino acid of the fusion protein is determined and used as a probe 
sequence. This probe sequence is used to search a sequence database (e.g., GenBank, PFAM 
or ProDdm). Every protein in the sequence database is examined for homology to the 
potential fusion protein (1,6, multiple proteins containing polypeptide sequences or domains 

10 from the potential fusion protein). A number of different methods of performing such 

sequence searches are known in the art, e.g., BLAST, BLITZ (Biocomputing Research Unit, 
University of Edinburgh, Scotland, the "MPsrch program" performs comparisons of protein 
sequences against the Swiss-Prot protein sequence database using the Smith and Waterman 
best local similarity algorithm), and FASTA. 

15 Probe sequences found in more than one protein (e.g. , A' and B' proteins) are 

defined as being "linked" so long as at least one protein per domain containing that domain 
but not the other is also identified. In other words, at least one protein or domain of the 
plurality of proteins must also be found alone in the sequence database. This verifies that the 
protein or domain is not an integral part of a first protein but rather a second independent 

20 protein having its own functional characteristics. 

Statistical methods can be used to judge the significance of possible matches. 
The statistical significance of an alignment score is described by the probability, P, of 
obtaining a higher score when the sequences are shuffled. One way to compute a P value 
threshold is to first consider the total number of sequence comparisons that are to be 

25 performed. For example, if there are N proteins in E. coli and A/in all other genomes this 
number is Nx M If a comparison of this number of random sequence would result in one 
pair to yield a P value of MNMby chance this then is set as the threshold. 

This method provides information regarding which proteins are functionally 
related (e.g., related biological functions common structural complexes, metabolic pathways 

30 or biological process) a subset of which physically interact in an organism. 
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Alignment Algorithms 

To align sequences, a number of different procedures can be used that produce 
a good match between the corresponding residues in the sequences. Typically, the Smith- 
Waterman (Smith (1981) Adv. Appl. Math, 2:482) or Needleman-Wunsch algorithm 
(Needleman (1970) J* Mol. Biol. 48:443) algorithm, are used, however, other, fester 
procedures such as BLAST, FASTA, PSI-BLAST (a version of Blast for finding protein 
families), or others known in the art (see infra discussion), can be used. 

Filtering Methods 

The Rosetta Stone Method provides at least two pieces of information. First 
the method provides information regarding which proteins are functionally related. Second 
the method provides information regarding which proteins are physically related. Each of 
these two pieces of information has different sources of error and prediction. The first type 
of error is introduced by protein sequences that occur in many different proteins and paired 
with many other protein sequences. The second type of error is introduced due to there often 
being multiple copies of similar proteins, called paralogs, in a single organism. In general, 
the "Rosetta Stone" method predicts functionally related proteins well, with no filtering of 
results required. However, it is possible to filter the error associated with either the first or 
second type of information. 

The invention recognizes that a few domains are linked to an excessive 
number of other domains by a "Rosetta Stone" protein. For example, 95% of the domains 
are linked to fewer than 25 other domains. However, some domains, e.g, the Src Homology 
3 (SH3) domain or ATP-binding cassette (ABC domains), link to more than a hundred other 
domains. These links were filtered by removing all links generated involving these 5% of 
domains (i.e., the domains linked to more than 25 other domains). For example, in E. colU 
without filtering, 3531 links were identified using the domain-based analysis, but after 
filtering only 749 links were identified. This method improved prediction of functionally 
related proteins by 28% and physically related proteins by 47%. Accordingly, there are a 
number of ways to filter the results to improve the significance of the functional links. As 
described above, as the number of functional links increases there is an increased higher 
chance of finding a "Rosetta Stone" protein. By reducing the excessively linked proteins one 
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reduces the chance number of "Rosetta Stone" proteins thereby increasing the significance of 
a functional link. 

Error introduced by multiple paralogs of linked proteins should have little 
effect on functional prediction, as paralogs usually have very similar function, but will affect 
the reliability of prediction of protein-protein interactions. This estimate is calculated for 
each linked protein pair, and can be estimated roughly as: 

Jn 

Fractional Error = 1 , 

N 

where N is the number of paralogous protein pairs, (e.g., A linked to B, A' linked to 
B\ A linked to B', and A' linked to B, in the case that A and A' are paralogs, as are B 
and B', and the linking proteins is AB as above). 

The error can also be estimated as l-T, where T is the mean percent of 
potential true positives calculated for all domain pairs in an organism. For each domain pair 
linked by a Rosetta Stone protein, there are w proteins with the first domain but not the 
second, and m proteins with the second domain but not the first. The percent of true 
positives T is therefore estimated as the smaller of n or m divided by n times m. As this error 
T can be calculated for each set of linked domains, it can describe the confidence in any 
particular predicted interaction. 

In addition, the error in functional links can be caused by small conserved 
regions or repeated common amino acid sequences being repeatedly identified in a "Rosetta 
Stone" protein by a plurality of distinct non-homologous polypeptides. To reduce this error 
the percent of identity between the "Rosetta Stone" and the distinct non-homologous 
polypeptide can be measured. Alignment percentages of about 50% to about 90%, or, 
alternatively, about 75%, between the "Rosetta Stone" and the distinct polypeptide are 
indicative of links that are not subject to the small peptide sequence. 

Phylogenetic Pathway Method 

The "phylogenetic profile" method compares protein sequences across all 
known genomes and analyzes the pattern of inheritance of each protein across the different 
organisms. In its simplest form, each protein is simply characterized by its presence or 
absence in each organism. For example, if there are 16 known genomes, then each protein 
may be assigned a 16-bit code or phylogenetic profile. Since proteins that function together 
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(e.g., in the same metabolic pathway or as part of a larger functional or structural complex) 
evolve in a correlated fashion, they should have the same or similar patterns of inheritance, 
and therefore similar phylogenetic profiles. Therefore, the function of one protein may be 
inferred from the function of another protein* which has a similar profile, if its function is 
known. As with the Rosetta Stone method, the function of one protein is inferred from the 
function of another protein which is dissimilar in sequence. Furthermore, the predicted link 
between the proteins has utility in developing, for example, drug targets, diagnostics and 
therapeutics. 

The phylogenetic profile method can be implemented in a binary code (i.e., 
describing the presence or absence of a given protein in an organism) or a continuous code 
that describes how similar the related sequences are in the different genomes. In addition, 
grouping of similar protein profiles may be made wherein similar profiles are indicative of 
functionally related proteins. Furthermore, the requirements for similarity can be modified 
depending upon particular criteria by varying the difference in similar bit requirements. For 
example, criteria requiring that the degree of similarity in the profile include all 16 bits being 
identical can be set, but may be modified so that similarity in 15 bits of the 16 bits would 
indicate relatedness of the protein profiles as well. Statistical methods can be used to 
determine how similar two patterns must be in order to be related. 

The phylogenetic profile method is applicable to any genome including, e.g., 
viral, bacterial, archaeal or eukaryotic. The method of phylogenetic profile grouping 
provides the prediction of function for a previously uncharacterized protein(s). The method 
also allows prediction of new functional roles for characterized proteins based upon 
functional linkages. It also provides potential informative connections (i.e., links) between 
uncharacterized proteins. 

To represent the subset of organisms that contain a homolog a phylogenetic 
profile is constructed for each protein. The simplest manner to represent a protein* s 
phylogenetic history is via a binary phylogenetic profile for each protein. This profile is a 
string with N entries, each one bit, where N corresponds to the number of genomes. The 
number of genomes can be any number of two or more (e.g., 2, 3, 4, 5, 10, 100, to 1000 or 
more). The presence of a homolog to a given protein in the /2 th genome is indicated with an 
entry of unity at the position (e.g., in a binary system an entry of 1). If no homolog is 
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found the entry is zero. Proteins are clustered according to the similarity of their 
phylogenetic profiles- Similar profiles show a correlated pattern of inheritance, and by 
implication, functional linkage. The method predicts that the functions of uncharacterized 
proteins are likely to be similar to characterized proteins within a cluster. 

5 In order to decide whether a genome contains a protein related to another 

particular protein, the query amino acid sequence is aligned with each of the proteins from 
the genome(s) in question using known alignment algorithm (see above). To determine the 
statistical significance of any alignment score, the probability, />, of obtaining a higher score 
when the sequences are shuffled is described. One way to compute a p value threshold is to 

10 first consider the total number of sequence comparisons that are being aligned. If there are N 
proteins in a first organism's genome and M in all other genomes this number is N x M If 
this number were compared to random sequences it would be expected that one pair would 

yield a p value of _! This value can be set as a threshold. Other thresholds may be used 

NM 

and will be recognized by those of skill in the art. 

15 A non-binary phylogenetic profile can be used. In this method, the 

phylogenetic profile is a string of N entries where the n th entry represents the evolutionary 
distance of the query protein to the homolog in the n th genome. To define an evolutionary 
distance between two sequences an alignment between two sequences is performed. Such 
alignments can be carried out by any number of algorithms known in the art (for examples, 

20 see those described above). The evolution is represented by a Markov process with 

substitution rates, over a fixed interval of time, given by a conditional probability matrix: 

p(aa— *aa') 

where aa and aa ' are any amino acids. One way to construct such a matrix is to 
25 convert the BLOSUM62 amino acid substitutions matrix (or any other amino acid 

substitution matrix, e.g., PAM100, PAM250) from a log odds matrix to a conditional 
probability (or transition) matrix: 

BLOSUM62m 

30 P(i — * j) is the probability that amino acid j will be replaced by amino acid j through 

point mutations according to the BLOSUM62 scores. The p/s are the abundances of amino 
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acid j and aie computed by solving the 20 linear equations given by the normalization 
conditions that: 

5>(/->y) = l . (2) 

5 The probability of this process is computed to account for the observed 

alignment by taking the product of the conditional probabilities for each aligned pair: 
P(py=Y[p{aan-±actn) . (3) 

n 

A family of evolutionary models is then tested by taking powers of the 

10 conditional probability matrix: p -p a (aa^>aa % The power a that maximized P is defined to 

be the evolutionary distance. 

Many other schemes may be imagined to deduce the evolutionary distance 

between two sequences. For example, one might simply count the number of positions in the 

sequence where the two proteins have adapted different amino acids. 

15 Although the phylogenetic history of an organism can be presented as a vector 

(as described above), the phylogenetic profiles need not be vectors, but may be represented 

by matrices. This matrix includes all the pair wise distances between a group of homologous 

protein, each one from a different organism. Similarly, phylogenetic profiles could be 

represented as evolutionary trees of homologous proteins. Functional proteins could then be 

20 clustered or grouped by matching similar trees, rather than vectors or matrices. 

In order to predict function, different proteins are grouped or clustered 

according to the similarity of their phylogenetic profiles. Similar profiles indicate a 

correlated pattern of inheritance, and by implication, functional linkage. 

Grouping or clustering may be accomplished in many ways. The simplest is 

25 to compute the Euclidean distance between two profiles. Another method is to compute a 

correlation coefficient to quantify the similarity between two profiles. All profiles within a 

specified distance of the query profile are considered to be a cluster or group. 

Typically a genome database will be used as a source of sequence 

information. Where the genome database contains only the nucleic acid sequence that 

30 sequence is translated to an amino acid sequence in frame (if known) or in all frames if 

unknown. Direct comparison of the nucleic acid sequences of two or more organisms may 

be feasible but will likely be more difficult due to the degeneracy of the genetic code. 

18 



WO 01/35317 



PCT/US00/31152 



Programs capable of translating a nucleic acid sequence are known in the art or easily 
programmed by those of skill in the art to recognize a codon sequence for each amino acid. 

The phylogenetic profile provides an indication of those proteins in each of the 
at least two organisms that share some degree of homology. Such a comparison can be done 
5 by any number of alignment algorithms known in the art or easily developed by one skilled 
in the art (see, for example, those listed above, e.g., BLAST, FASTA etc.) In addition, 
thresholds can be set regarding a required degree of homology. Each protein is then grouped 
at 224 with related proteins that share a similar phylogenetic profile using grouping 
algorithms. 

10 "Functionally-, Structurally- or Metabolically- Linked" Method 

The physiologic linkage" method is a computational method that detects (i.e., 
identifies) proteins, and the genes that encode them, that participate in a common functional 
pathway (e.g., cell motility or cell division), that participate in the synthesis of the same or a 
similar structural complex (e.g., a cell wall) or participate in the same or similar metabolic 

15 pathway (e.g., glycolysis, lipid synthesis, and the like). Proteins within these common 
functional pathway groups are examples of "functionally linked" proteins. Having a 
common functional "goal" they evolve in a correlated fashion. Thus, "homologs" in 
different organisms can be comparatively identified. While these detection methods are very 
effective in identifying functional homologies in the same subset of organisms, functional 

20 linkages can be made between widely genetically disparate organisms. 

In one aspect, metabolic pathways are defined as links between proteins that 
operate in the same metabolic pathway that can be identified by sequence identity searching, 
e.g., by performing a BLAST search to find top-scoring polypeptides with high similarity 
(BLAST alignment E-value < 10" 20 ) to polypeptides identified in a known pathway. For 

25 example, M tuberculosis proteins were so analyzed against £ coli proteins; MTB proteins 
whose E. coli homologs (i.e., having high similarity by BLAST alignment) act adjacently in 
metabolic pathways as defined in the EcoCyc database (see, e.g., Karp (1998) Nucleic Acids 
Res. 26:50-53) were identified. 

In another example, flagellar proteins are found in bacteria that possess 

30 flagella but not in other organisms. Accordingly, if two proteins have homologs in the same 
subset of fully sequenced organisms, they are likely to be functionally linked. The methods 
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of the invention use this concept to systematically map links between all the proteins coded 
by a genome. 

Typically, functionally linked proteins have no amino acid sequence similarity 
with each other and, therefore, cannot be linked by conventional sequence alignment 
techniques. Accordingly, the methods of the invention identify drug targets that could not be 
identified using conventional sequence comparison (i.e., sequence homology or sequence 
identity) techniques. 

Prediction of functionally linked proteins by the "phylogenetic method" can 
also be used in conjunction with the "domain fusion'* or "Rosetta Stone" method and also can 
be filtered by other methods that predict functionally linked proteins, such as the protein 
phylogenetic profile method or the analysis of correlated mRNA expression patterns. It was 
found that filtering by these two methods for the Rosetta Stone prediction for 5. cerevisiae, 
that proteins predicted to be functionally linked by two or more of these three methods were 
as likely to be functionally related as proteins that were observed to physically interact by 
experimental techniques like yeast 2-hybrid methods or co-immunoprecipitation methods. 

For example, a combination of these methods of prediction can be used to 
establish links between proteins of closely related function. The methods of the invention 
(Le> 9 the "Rosetta Stone" method and the "phylogenetic profile" method) can be combined 
with one another or with other protein prediction methods known in the art; see, for example, 
Eisen (1998) "Cluster analysis and display of genome-wide expression partners," Proc. Natl 
Acad ScL USA, 95:14863-14868. 

The various techniques, methods, and variations thereof described can be 
implemented in part or in whole using computer-based systems and methods. Additionally, 
computer-based systems and methods can be used to augment or enhance the functionality 
described above, increase the speed at which the functions can be performed, and provide 
additional features and aspects as a part of or in addition to those of the invention described 
elsewhere in this document. Various computer-based systems, methods, and 
implementations in accordance with this technology are described herein. 

Proteins linked to current drug targets 

The invention also provides a novel method for identifying a polypeptide, or 
the nucleic acid sequence that encodes it, that is a target for a drug. The method analyzes the 
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functional relationship between at least two sequences, wherein at least one of the sequences 
is a known target of a drug or encodes a polypeptide drug target The method comprises 
identifying proteins, and the genes that encode them, that are functionally linked to the 
targets of known drugs. TTie functional linkage is determined by using the "domain fusion" 
method, the "phylogenetic profile** method or the "physiologic linkage" method, or a 
combination thereof, as described herein. 

Thbs, this aspect of the invention provides methods identifying drug targets 
from among all or a subset of genes in a genome using computationally-determined 
functional linkages. In one implementation of the method, functional linkages are calculated 
using the "domain fusion" method, the "phylogenetic profile" method or the "physiologic 
linkage" method, or a combination thereof, between all "query genome genes." Next, each 
set of genes predicted to be functionally linked to either a known drug target or to a sequence 
homolog or ortholog (defined below) to a known drug target are examined. These proteins 
(and the nucleic acids that encode them) are functionally linked to known drug targets; thus, 
they are operating in the same pathways or systems targeted by the known drug. 
Accordingly, the methods of the invention have identified them as drug targets. 

This method is particularly effective for identifying drug targets in pathogens, 
such as microorganisms, e,g., bacteria, viruses and the like. This method allows for the 
identification of novel drug targets that cannot be identified by other techniques, such as 
traditional sequence homology or sequence identity comparison techniques. Several known 
drug targets in M. tuberculosis were used with the methods of the invention to use functional 
linkages to identify potential new drug targets in the same pathways as the known drug 
targets. 

There are very few drugs that are effective for anti-tuberculosis therapy, since 
the complex lipid-rich mycobacterial cell wall is impermeable to many antibacterial agents. 
Additionally, single- and multi-drug resistance is rapidly emerging against these drugs. To 
address this issue, the methods of the invention were used to identify Mycobacterium 
tuberculosis (MTB or TB) proteins that are fiinctionally linked to the targets of known drugs. 
Inhibiting these proteins should have the same effect on the organism as the drug, since the 
same processes or pathways would be disrupted. Targeting multiple components of a given 
biochemical pathway would also diminish the opportunity for the development of resistance 
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because various related proteins would have to mutate against inhibitors while preserving the 
overall functionality of the pathway. 

A list of targets of essential anti-IB drugs (World Health Organization, 
Geneva, Switzerland) was compiled. The anti-TB drugs included isoniazid, rifampicin, 
ethambutol, streptomycin, pyrazinamide and thiacetazone. Although not enough is known 
about the molecular basis of action of the latter two, the functional linkages of the known 
drug targets was examined. 

Isoniazid This is one of the most widely used of all anti-tuberculosis drugs. 
It is believed that the compound is activated by the catalase-peroxidase KatG. Once 
activated, it then attaches to a nicotinamide adenine dinucleotide bound to the enoyl-acyl 
carrier protein reductase InhA, resulting in the inhibition of mycotic acid biosynthesis 
Rozwarski (1998) Science 279:98-102. 

Using the "phylogenetic profile, the inhA gene was "linked," or functionally 
associated with, to two polyketide synthases, pksl and pks6 (Figure 1), both of which contain 
acyl carrier protein motifs. The polyketide synthase pks6 is in turn known from established 
metabolic pathways to be linked to fatty acid biosynthesis gene accD3. Further, pks6 is 
linked to fadD28 and to the operon containing the genes ppsA-E, all recendy reported to be 
crucial for bacterial replication in host lungs (see, e.g., Cox (1999) Nature 402:79-83). 

The inhA gene was also linked to an operon encoding two putative 
oxidoreductases and a gene of entirely unknown function. The inhA gene was further linked 
to a second operon that includes pepR and gpsL PepR is a protease whose Bacillus subtilis 
homolog is adjacent to the genes coding for enzymes that synthesize diaminopimelate, a 
component of the cell wall incorporated by the murE gene product and diaminopicolinate 
(see, e.g., Chen (1993) J. Biol. Chcm. 268:9448-9465). PepR is an ortholog of an essential 
yeast gene and is likely to be essential for MTB (see below). GpsI is a putative 
multifunctional enzyme involved in guanosine pentaphosphate synthesis and 
polyribonucleotide nucleotidyltransfer. The high reliability of the predicted functional link 
between gpsl and pepR and the absence of eukaryotic homologs suggests that gpsl could be a 
promising target for drug design, 

Rifampicin. This compound, along with the related rifabutin and KRM-1648 
are believed to act by directly targeting the RNA polymerase p-subunit (rpoB) given that 
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96% of resistant isolates were found to have mutations of various types in a limited region of 
the rpoB gene (see, e.g., Yang (1998) J. Antimicrob. Chemother. 42:621-628), 

Using the methods of the invention, as expected, functional linkages were 
found to another RNA polymerase subunit, rpoC, as well as to various tRNA synthases and 
5 ribosomal proteins* However, no functional links to unchaiacterized proteins were found. 

EihambutoL This drug is effective against tuberculosis when used in 
combination with isoniazid. It is believed that the drug interacts with the EmbB protein, a 
probable arabinosyl-transferase, inhibiting the biosynthesis of arabinan, a component of cell- 
envelope lipids. As with rifampicin, the evidence for this interaction is indirect, since 

10 mutations in the embB gene are responsible for ethambutol resistance (see, e.g., Lety (1997) 
Antimicrob. Agents Chemother, 4 1 :2629-2633). 

The "gene proximity" method correctly clusters embB with embA (Rv3794). 
This cluster is linked to a set of mostly uncharacterized genes by the "phylogenetic profile" 
method; see Figure 2, which shows an analysis of EmbB, the target for the anti-tuberculosis 

15 drug Ethambutol, and shows functional linkages to genes mostly of unknown function but 
with some indications of localization at the bacterial membrane. 

Two of the uncharacterized genes, Rvl706c and Rvl800, belong to the 
abundant PE/PPE family of proteins hypothesized to be a source of antigenic variation with 
the potential ability to interfere with immune responses by inhibiting antigen processing (see, 

20 e.g., Cole (1998) Nature 393, 537-544). A third uncharacterized gene, Rvl967 belongs to the 
one of the four copies of the mce operon. This operon consists of eight genes coding for 
integral membrane proteins and proteins that have N-terminal signal sequences or 
hydrophobic segments and are believed to be involved in pathogenicity (see, e.g., Cole 
(1998) supra), Rv0528 codes for a hypothetical membrane protein and Rv2159c corresponds 

25 to the murF gene, which participates in the biosynthesis of peptidoglycan precursors. 

The majority of the "links," or functionally associated sequences, involved 
proteins associated with processes related to the bacterial cell wall (with the possible 
exception of atsA and the putative choline dehydrogenase Rvl279, whose relationship to 
these processes is not immediately obvious). The proteins of unknown function are therefore 

30 also expected to play some role in these processes and are thus of interest as potential drug 
targets. 
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Streptomycin. This drug acts by binding to the 16S rRNA and inhibits protein 
synthesis. Resistance to this compound emerges from mutations in the corresponding gene 
(ns), as well as in the gene encoding for the ribosomal protein SI 2 (rpsL). Disruptions to 
RpsL effect streptomycin resistance by altering the higher order structure of 16S rRNA (see, 
e.g., Sreevatsan (1996) Antimicrob. Agents Chemother. 40:1024-1026). 

Although streptomycin doesn't directly target RpsL, the functional links 
generated for this protein was examined, as any target whose inhibition will ultimately 
disrupt bacterial protein synthesis is likely to be an effective antigrowth/ anti-microbial 
target. As with the rifampicin target, the only functional linkages found for this protein were 
the expected protein synthesis-related proteins, including large ribosomal subunit proteins 
L2, L5, LI 1 , and L14; small ribosomal subunit proteins S4, S5, S7, S8, and SI 1 ; elongation 
factors fusA and Ef-Tu; the chaperones GroEL, clpB and ftsH; and the Clp protease subunits 
clpC and clpX. 

Proteins linked to cell-wall related proteins 

The invention also provides a novel method for identifying a nucleic acid or a 
polypeptide sequence in an organism that is linked to a cell-wall related protein. The method 
analyzes the functional relationship between at least two sequences, wherein at least one of 
the sequences is a cell-wall related protein, or, the sequence is a nucleic acid sequence that 
encodes a cell-wall related protein. The method comprises identifying proteins, and the 
genes that encode them, that are functionally linked to a cell-wall related protein. The 
functional linkage is determined by using the "domain fusion" method, the "phylogenetic 
profile" method or the "physiologic linkage" method, or a combination thereof, as described 
herein. 

Approximately eleven A£ tuberculosis proteins are indicated by sequence 
homology to be penicillin-binding proteins, thought to synthesize peptidoglycan in the course 
of cell elongation and cell wall metabolism (see, e.g., Broome-Smith (1985) Eur. J. Biochem. 
147:437-446). Using the methods of the invention, the functional linkages found for these 
proteins map out many of the known cell wall synthetic enzymes and reveal more than 10 
proteins of unknown function that may also participate in cell wall metabolism. Figure 3 
shows an analysis of five of the approximately eleven MTB proteins presumed to bind 
penicillin to reveal functional linkages to various potential operons consisting of genes 
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involved in various aspects of cell wall metabolism, including cell shape determination and 
peptidoglycan biosynthesis, as well more than ten genes of unknown function, which we can 
now associate with cell wall metabolism. 

Three of the proteins (pbpA, pbpB, and ponAl) reside in conserved gene 
clusters, presumably operons. Other genes in the clusters around pbpA and pbpB are also 
implicated in cell wall metabolism. For example, pbpA resides next to rodA, a membrane- 
associated protein whose R coli homolog determines cell shape and is required for enzymatic 
activity of penicillin binding proteins (see, e.g., Matsuzawa (1989) J. Bacteriol. 171:558- 
560). Likewise, pbpB resides next to six peptidoglycan biosynthesis genes and the two 
septum and cell wall formation proteins ftsW and ftsZ. 

Two additional gene clusters were linked to these penicillin binding proteins 
by either the "phylogenetic profile" or "Rosetta Stone" pattern methods of the invention. 
One cluster is composed of the peptidoglycan synthetic protein murB and a putative 
membrane protein of unknown function that the functional linkages suggest is involved in 
cell wall metabolism* The second gene cluster contains four genes, three of which are 
predicted to reside in the cell membrane or envelope. Therefore, the uncharacterized genes 
in these clusters are likely to be involved in cell wall metabolism, closely related to the 
function of the penicillin binding proteins and are therefore promising drug targets. 

Another gene linked to cell wall metabolism by the computationally-derived 
linkage methods of the invention is gcpE, see Figure 4, which shows that the uncharacterized 
gene gcpE, known to be essential for bacterial survival (see, e.g., Baker (1992) FEMS 
Microbiol. Lett. 73:175-180), is predicted to be involved in cell wall metabolism through its 
functional links to a putative membrane protein and two murein hydrolase genes, lytBl and 
lytB2, involved in cell separation. The genes forming a putative operon with gcpE are 
proposed as potential drug targets. The functional linkages place gcpE in a conserved gene 
cluster with two genes of unknown function, one of which encodes a membrane protein. 
However, the three genes show correlated inheritance with two homologs of lytB, an E. coli 
gene involved in penicillin tolerance (see, e.g, Gustafson (1993) J. Bacteriol. 175:1203-1205) 
and recently shown to encode a murein hydrolase essential for cell separation (see, e.g., 
Garcia (1999) MoL Microbiol 31:1275-1277). The uncharacterized proteins from this 
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cluster are therefore expected to participate in processes similar to GcpE and might therefore 
be promising drug targets. 

Proteins linked to potentially novel pathways 

The invention also provides a novel method for identifying a polypeptide, or a 
nucleic acid that encodes it, that is linked to potentially novel biochemical (e.g., biosynthetic, 
metabolic) pathways. The method analyzes the functional relationship between at least two 
sequences, wherein at least one of the sequences is associated with a biochemical pathway, 
such as a pathway in a microorganism that enables the pathogen to evade an immune process. 
The method comprises identifying proteins, and the genes that encode them, that are 
functionally linked to the pathway-linked sequences* The functional linkage is determined 
by using the "domain fusion** method, the "phylogenetic profile'* method or the "physiologic 
linkage" method, or a combination thereof, as described herein. 

For example, the htrA gene encodes for a putative heat shock protein 
homologous to HtrA from Salmonella typhimurium, a serine protease that degrades aberrant 
periplasmic proteins. Mutations in this protein have been linked with reduced viability in 
host macrophages (see, e.g., Johnson (1991) Mol. Microbiol. 5:401-407). Thus, it was 
decided to investigate the function of htrA. Using the methods of the invention, results 
indicated that the htrA protein is part of a process that has not yet been characterized. The 
gene is predicted with very high reliability to function with the uncharacterized gene 
Rv 1224c, see Figure 5, which shows the involvement of htrA in a potentially novel pathway 
and the gene encoding the putative heat shock protein HtrA is functionally linked to a set of 
genes mostly of unknown function, suggesting the existence of a novel pathway. The 
partially characterized proteins suggest that the pathway relates to membrane-associated 
processes such as signaling and/or transport The lack of eukaryotic homologs for most of 
the genes linked to htrA, suggests that proteins of this pathway could be promising drug 
targets. 

Through its phylogenetic profile, htrA is linked to a group of uncharacterized 
proteins, including a putative lipid esterase (Rvl900c), an ABC transporter (Rv3783) and the 
uncharacterized protein Rvl216c, which has weak homology to the laminin B receptor of 
Xenopus laevis, suggesting that it might be a membrane protein. From this analysis, it can be 
concluded that htrA is part of a novel pathway that involves membrane-associated processes, 
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such as signaling and/or transport. Because the majority of the proteins linked to htrA have 
no eukaryotic homologs, and given the importance of htrA in S. typhimurium pathogenesis, 
this pathway represents another potential source of novel targets for anti-tuberculosis drugs. 

Proteins linked to essential proteins 

5 The invention also provides a novel method for identifying a polypeptide, or 

the nucleic acid sequence that encodes it, that is linked to an essential protein (e.g., a protein 
necessary for the growth of an organism, such as a bacterium). The method analyzes the 
functional relationship between at least two sequences, wherein at least one of the sequences 
is linked to an essential protein, or, the sequence is a nucleic acid sequence that itself is 

10 essential or encodes a polypeptide linked to an essential protein. The functional linkage is 
determined by using the "domain fusion" method, the "phylogenetic profile" method or the 
"physiologic linkage" method, or a combination thereof, as described herein. 

For example, the MIPS database (Munich Information Center for Protein 
Sequences; MIPS provides access through its WWW server to a spectrum of generic 

15 databases, including PEDANT, MYGD, MATT), MEST, the PIR4ntemational Protein 

Sequence Database, the protein family database PROTFAM, the MITOP database, and the 
ali-against-all FASTA database; see, e.g., Mewes (1999) Nucleic Acids Res. 27:44-48) 
contains a list of 734 genes that are essential for Saccharomyces cerevisiae viability (see, 
e.g., Mewes (1 999) supra). A list of Mycobacterium tuberculosis genes orthologous to these 

20 essential genes was generated. Using the methods of the invention, 60 such genes were 
found. The products of these genes have a high likelihood of also being essential to the 
tuberculosis bacterium and therefore could be promising therapeutic targets. Furthermore, 
since the list of essential genes came from a eukaryote, there is a significant chance that these 
genes would also be found in the human genome, 

25 Automatic Method to Identify Drug Targets from Functional Linkages 

One aspect of the invention provides a computational method to identify 
potential drug targets among the proteins expressed by a genome. This aspect takes 
advantage of the functional linkages calculated between genes in a genome using the 
methods described herein, as well as the detection of sequence homology and the knowledge 

30 of a set of lethal or "essential" genes in one or more organisms. 
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To identify drug targets in a query genome, the sequence homology between 
all of the genes in that genome and all of the genes in the genome of an organism for which 
essential genes are known is calculated. For example, as discussed herein, the query genome 
is Mycobacterium tuberculosis (TB) and the genome with known essentials is the yeast S. 

5 cerevisiae. Sequence homology between all TB genes and all yeast genes was calculated 
using the methods of the invention. 

"Equivalent" or "orthologous" genes were also identified by another aspect of 
the invention that comprises doing a reverse sequence search (e.g., yeast vs. TB) and then 
choosing pairs of genes that are the symmetric best-scoring sequence search. In one 

10 exemplary aspect, MTB orthologs of Saccharomyces cerevisiae genes were generated by 
finding all pairs of genes (TBj,SCj) where TBi was the top hit from a BLAST search of the 
yeast gene SCj against the MTB genome, SQ was the top hit from a BLAST search of the 
MTB gene TBj against the Saccharomyces cerevisiae genome and both top hits had a 
BLAST E*value <= IxlO" 3 . 

1 5 For example, a TB gene is an ortholog of a yeast gene if the yeast gene is the 

best scoring sequence match when yeast is searched with the TB gene, and the TB gene is the 
best scoring sequence match when TB is searched with the yeast gene. We define these 
"symmetric' 1 pairs as "orthologs." 

After identifying orthologs between the query genome and the genome with 

20 known essential genes, a set of query genome genes that are orthologs of known essential 
genes in the other genome was chosen. These genes were designated the set of "putative 
essentials' 1 . For the purposes of the algorithm of the invention, these query genome genes are 
assumed to be essential genes, since they are the equivalents of essential genes in another 
genome. These genes act as "markers" or indicators of essential pathways in the query 

25 genome. One could supplement this set with genes already known to be essential in the 

query organism. Functional linkages (determined by the methods of the invention) between 
all query genome genes were examined. The query genome genes linked to all of the 
putative essential genes were examined. This set of genes was designated as the "predicted 
members of essential pathways." These genes are likely to be involved in important 

30 pathways, since the (predicted) pathways have members that are putative essentials* Lastly, 
the method removes from the set of genes in predicted essential pathways all of those genes 
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that have sequence homology to eukaryotic genes or proteins. The genes that remain after 
this filtering step are the predicted drug targets for the query organism. 

As a benchmark, this method was applied to the Af tuberculosis genome. Of 
the over 3900 genes in TB, 1 1 were identified as potential drug targets. Comparing this list 
of 1 1 predicted targets to the less than 1 0 known drug anti-TB drug targets, one gene was a 
known drug target and one was linked to a known drug target. Accordingly, the algorithm of 
the invention performed statistically significantly much better than a random choice of genes. 
A rough estimate of statistical significance suggests that one would expect to see 2 of 10 
known drug targets in a sample of 1 1 out of 3900 genes only 3.8 times out of 10,000 trials 
(probability of occurring by random chance of 3.8 x 10" 4 ). Therefore, this embodiment of the 
method is an entirely computational algorithm drawing on the demonstrated ability of the 
general methods of the invention to predict functional linkages between genes and to 
effectively identify drug targets in bacteria. The effectiveness of this method to identify 
novel drug targets was clearly demonstrated when the algorithm was applied to the M. 
tuberculosis genome. 

The specific inhibition of the MTB homologs might be difficult To address 
this issue, using the methods of the invention, functional links to the essential genes were 
searched. Functional links were selected which either do not have homologs in yeast, or the 
enzymatic activity of their products are known to be absent in human cells. Using the 
highest confidence data, functional links for 23 of the genes (indicated in bold in Table 1) 
were found. 
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Eight of these were linked to 12 unique MTB genes that satisfied the criteria 
of the invention's methods (Table 1). Exemplary findings include: 

(1) the gene folP, which encodes the enzyme dihydropteroate synthase 
(DHPS) known to be the target of sulfonamide antibacterial drugs. Although it is found in 
some eukaryotes, DHPS activity is not found in human cells (see, e.g., Huovinen (1995) 
Antimicrob. Agents Chemother. 39:279-2890, 

(2) the product of the gene folK, a 7,8-dihydro-6-hydroxymethyl- 
pterinpyrophosphokinase, has recently been proposed as a target for broad-spectrum 
antibacterial drugs (see, e,g„ Stammers (1999) FEBS Lett. 456:49-53). 

(3) the gene gpsl, is not only strongly linked to the essential yeast gene pepR, 
but it is also functionally linked to inhA, the target of the drug isoniazid (see above), making 
it a very compelling candidate for drug design. 
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Table 2. Subset of genes from Table 1 that are functionally linked to genes without 
yeast homologs. 



Gene 


Link* 


Comments 


RV0005 


Rv0002 
Rv0003 

Rv0006 


dnaN DNA polymerase m, p-subunit 
recF DNA replication and SOS induction 
gyrA DNA gyrase subunit A 


Rv0350 


Rv0351 j 

Rv0352 


grpE stimulates DnaFC ATPase activity 

dnaJ acts with GrpE to stimulate DnaK ATPase 


RvlOlO 


RvlOOB 
RV1009 
RvlOIX 


Similar to Ecoli hypothetical protein YefH 

Possible lipoprotein, similar to various other MTB proteins 

Similar to Exoli hypothetical protein YcbH 


Rv2439c 


Rv2427c 
Rv2440c 
Rv2441c 
Rv2442c 


proA y-glutamyl phosphate reductase 
obg Obg GTP-binding protein 
rpmA 50S ribosomal protein L27 
rplU SOS ribosomal protein L2 1 


Rv2782c 


Rv2783c 


gps I pppGpp synthase and polyribonucleotide phosphory lase 


Rv3598c 


Rv3600c 
Rv3606o 
Rv3607c 
Rv3608c* 

Rv3610c 


similar to Bacillus subtilis hypothetical protein YacB 
folK 7,8^ihydro^-hyaroxymethylpterinpyrophosphokinase 
f olX may be involved in folate biosynthesis 
f olP dihydropteroate synthase (DHPS) 
f tsH inner membrane protein, chaperone 


Rv3608c 


Rv3598c 
Rv3600c 
Rv3606c 
Rv3607c 

Rv3609c 
Rv3610c 


lysS lysyl-tRNA synthase 

similar to Bacillus subtilis hypothetical protein YacB 
f olK 7,8-dihydro-6-hydroxyrnethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
f olE GTP cyclohydrolase 1 
f tsH inner membrane protein, chaperone 


Rv3609c 


Rv3606c 
Rv3607c 

Rv3608c* 


f olK 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase 
f olX may be involved in folate biosynthesis 
folP dihydropteroate synthase (DHPS) 



Genes without yeast homologs shown in boldface 

DHPS activity is found in some eukaryotic cells but not in human cells 

In summary, the methods of the invention allowed identification of this 



combination of functional linkages to essential genes. This information, together with the 
lack of eukaiyotic homologs for these genes, makes this group of proteins promising drug 
targets, particularly because their inhibition is expected to disrupt vital bacterial processes 
with a low likelihood of toxicity from the inhibition of a host equivalent. 
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Computer Implementation 

The various techniques, methods, and aspects of the invention described 
herein can be implemented in part or in whole using computer-based systems and methods. 
Additionally, computer-based systems and methods can be used to augment or enhance the 
5 functionalities and algorithms described herein, increase the speed at which the functions can 
be performed, and provide additional features and aspects as a part of or in addition to those 
of the invention described elsewhere in this document. Various exemplary computer-based 
systems, methods and implementations in accordance with the above-described technology 
are presented herein. 

10 The processor-based system can include a main memory, such as a random 

access memory (RAM), and can also include a secondary memory. The secondary memory 
can include, for example, a hard disk drive and/or a removable storage drive, representing a 
floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage 
drive reads from and/or writes to a removable storage medium. Removable storage media 

15 can be a floppy disk magnetic tape, an optical disk, and the like, which can be read by and 
written to by removable storage drive. The removable storage media can includes a 
computer usable storage medium having stored therein computer software and/or data. 

In alternative embodiments, secondary memory may include other similar 
means for allowing computer programs or other instructions to be loaded into a computer 

20 system. Such means can include, for example, a removable storage unit and an interface. 

Examples of such can include a program cartridge and cartridge interface (such as the found 
in video game devices), a movable memory chip (such as an EPROM, or PROM) and 
associated socket, and other removable storage units and interfaces that allow software and 
data to be transferred from the removable storage unit to the computer system. 

25 The computer system can also include a communications interface. 

Communications interfaces allow software and data to be transferred between computer 
system and external devices. Examples of communications interfaces include modems, 
network interfaces (such as, for example, an Ethernet card), communications ports, PCMCIA 
slots and cards, and the like. Software and data transferred via a communications interface 

30 can be in the form of signals that can be electronic, electromagnetic, optical or other signals 
capable of being received by a communications interface. These signals can be provided to 
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communications interface via a channel capable of carrying signals and can be implemented 
using a wireless medium, wire or cable, fiber optics or other communications medium. Some 
examples of a channel can include a phone line* a cellular phone link, an RF link, a network 
interface, and other communications channels. 

As used herein, the terms "computer program medium** and "computer usable 
medium" are used to generally refer to media such as a removable storage device, a disk 
capable of installation in a disk drive, and signals on a channel, or equivalents thereof. These 
computer program products are means for providing software or program instructions to 
computer systems. Computer programs (also called computer control logic) can be stored in 
main memory and/or secondary memory. Computer programs can also be received via a 
communications interface. Such computer programs, when executed, enable the computer 
system to perform the features of the present invention as discussed herein. Computer 
programs, when executed, enable the processor to perform the features of the present 
invention. Accordingly, in one aspect of the invention, such computer programs represent 
controllers of the computer system. 

In another aspect of the invention the methods and algorithms are 
implemented using software, the software may be stored in, or transmitted via, a computer 
program product and loaded into a computer system using a removable storage drive, hard 
drive or communications interface. The control logic (software), when executed by the 
processor, causes the processor to perform the functions of the invention as described herein. 

In another aspect, the elements are implemented primarily in hardware using, 
for example, hardware components such as PALs, application specific integrated circuits 
(ASICs) or other hardware components. Implementation of a hardware state machine so as 
to perform the functions described herein will be apparent to person skilled in the relevant 
art(s)< In yet another embodiment, elements are implanted using a combination of both 
hardware and software. 

In another aspect, the computer-based methods can be accessed or 
implemented over the World Wide Web by providing access via a Web Page to the methods 
of the present invention. Accordingly, the Web Page is identified by a Universal Resource 
Locator (URL). The URL denotes both the server machine, and the particular file or page on 
that machine. In this embodiment, it is envisioned that a consumer or client computer system 
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interacts with a browser to select a particular URL, which in turn causes the browser to send 
a request for that URL or page to the server identified in the URL. Typically the server 
responds to the request by retrieving the requested page, and transmitting the data for that 
page back to the requesting client computer system (the client/server interaction is typically 
performed in accordance with the hypertext transport protocol ("HTTP")). The selected page 
is then displayed to the user on the client's display screen. The client may then cause the 
server containing a computer program of the present invention to launch an application 
comprising a method of the invention, for example, to identify a nucleic acid or a polypeptide 
sequence that may be a target for a drug comprising the steps of (a) providing a first nucleic 
acid or a polypeptide sequence that is known to be a drug target; (b) providing an algorithm 
capable analyzing a functional relationship between nucleic acid or polypeptide sequences 
selected from the group consisting of a "domain fusion" method, a "phylogenetic profile" 
method and a "physiologic linkage" method; and, (c) comparing the first nucleic acid or the 
polypeptide drug target sequence to a plurality of sequences using at least one algorithm to 
identify a second sequence that has a functional relationship to the first sequence, thereby 
identifying a nucleic acid or a polypeptide sequence that may be a target for a drug, based on 
a query sequence provided by the client 

Nucleic Acids and Polypeptides 

The invention also provides isolated nucleic acids and polypeptides 
comprising the sequences as set forth in Table 3 and Table 4 (below). As used herein, 
"isolated," when referring to a molecule or composition, such as, e.g. , an isolated infected 
cell comprising a nucleic acid sequence derived from a library of the invention, means that 
the molecule or composition (including, e.g., a cell) is separated from at least one other 
compound, such as a protein, DNA, RNA, or other contaminants with which it is associated 
in vivo or in its naturally occurring state. Thus, a nucleic acid or polypeptide or peptide 
sequence is considered isolated when it has been isolated from any other component with 
which it is naturally associated. An isolated composition can, however, also be substantially 
pure. An isolated composition can be in a homogeneous state. It can be in a dry or an 
aqueous solution. Purity and homogeneity can be determined, e.g., using any analytical 
chemistry technique, as described herein. 
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The term "nucleic acid" or "nucleic acid sequence" refers to a deoxy- 
ribonucleotide or ribonucleotide oligonucleotide, including single- or double-stranded, or 
coding or non-coding (e.g., "antisense") forms. The term encompasses nucleic acids, i.e., 
oligonucleotides, containing known analogues of natural nucleotides. The term also 

5 encompasses nucleic-acid-like structures with synthetic backbones, see e.g., Oligonucleotides 
and Analogues, a Practical Approach, ed. F. Eckstein, Oxford Univ. Press (1991); Antisense 
Strategies, Annals of the N. Y. Academy of Sciences, Vol 600, Eds. Baserga et al. (NYAS 
1992); Milligan (1993) J. Med. Chem. 36:1923-1937; Antisense Research and Applications 
(1993, CRC Press), WO 97/03211; WO 96/39154; Mata(l997) Toxicol. Appl. Pharmacol. 

10 144:189-197; Strauss-Soukup (1997) Biochemistry 36:8692-8698; Samstag (1996) Antisense 
Nucleic Acid Drug Dev 6:153-156, As used herein, the "sequence" of a nucleic acid or gene 
refers to the order of nucleotides in the polynucleotide, including either or both strands (sense 
and antisense) of a double-stranded DNA molecule, e. g. , the sequence of both the coding 
strand and its complement, or of a single-stranded nucleic acid molecule (sense or antisense). 

15 For example, in alternative embodiments, promoters drive the transcription of sense and/or 
antisense polynucleotide sequences of the invention, as exemplified by Table 3. 

The terms "polypeptide," "protein," and peptide" include compositions of the 
invention that also include "analogs " or "conservative variants" and "mimetics" 
("peptidomimetics") with structures and activity that substantially correspond to the 

20 exemplary sequences, such as the sequences in Table 4. Thus, the terms "conservative 

variant'* or "analog" or "mimetic" also refer to a polypeptide or peptide which has a modified 
amino acid sequence, such that the change(s) do not substantially alter the polypeptide's (the 
conservative variant's) structure and/or activity (e.g., immunogenicity, ability to bind to 
human antibodies, etc.), as defined herein. These include conservatively modified variations 

25 of an amino acid sequence, i.e., amino acid substitutions, additions or deletions of those 

residues that are not critical for protein activity, or substitution of amino acids with residues 
having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non- 
polar, etc.) such that the substitutions of even critical amino acids does not substantially alter 
structure and/or activity. Conservative substitution tables providing functionally similar 

30 amino acids are well known in the art. For example, one exemplary guideline to select 
conservative substitutions includes (original residue followed by exemplary substitution): 
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ala/gly or ser; arg/ lys; asn/ gin or his; asp/glu; cys/ser, gln/asn; gly/asp; gly/ala or pro; 
his/asn or gin; ile/leu or val; leu/ile or val; lys/arg or gin or glu; met/leu or tyr or ile; phe/met 
or leu or tyr; ser/thr; thr/ser; trp/tyr; tyr/trp or phe; val/ile or leu. An alternative exemplary 
guideline uses the following six groups, each containing amino acids that are conservative 
substitutions for one another: 1) Alanine (A), Serine (S), Threonine (T); 2) Aspartic acid (D), 
Glutamic acid (E); 3) Asparagine (N)» Glutamine (Q); 4) Arginine (R), Lysine (K); 5) 
Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine 
(Y), Tryptophan (W); (see also, e.g., Creighton (1984) Proteins, W.H. Freeman and 
Company; Schulz and Schiraer (1979) Principles of Protein Structure, Springer-Verlag). One 
of skill in the art will appreciate that the above-identified substitutions are not the only 
possible conservative substitutions. For example, for some purposes, one may regard all 
charged amino acids as conservative substitutions for each other whether they are positive or 
negative. In addition, individual substitutions, deletions or additions that alter, add or delete 
a single amino acid or a small percentage of amino acids in an encoded sequence can also be 
considered "conservatively modified variations.** 

The terms "mimetic" and ^eptidomimetic" refer to a synthetic chemical 
compound that has substantially the same structural and/or functional characteristics of the 
polypeptides of the invention (e.g., ability to bind, or "capture," human antibodies in an 
ELISA). The mimetic can be either entirely composed of synthetic, non-natural analogues of 
amino acids, or, is a chimeric molecule of partly natural peptide amino acids and partly non- 
natural analogs of amino acids. The mimetic can also incorporate any amount of natural 
amino acid conservative substitutions as long as such substitutions also do not substantially 
alter the mimetics' structure and/or activity. As with polypeptides of the invention which are 
conservative variants, routine experimentation will determine whether a mimetic is within the 
scope of the invention, i.e., that its structure and/or function is not substantially altered. 
Polypeptide mimetic compositions can contain any combination of non-natural structural 
components, which are typically from three structural groups: a) residue linkage groups other 
than the natural amide bond ("peptide bond") linkages; b) non-natural residues in place of 
naturally occurring amino acid residues; or c) residues which induce secondary structural 
mimicry, i.e., to induce or stabilize a secondary structure, e.g., a beta turn, gamma turn, beta 
sheet, alpha helix conformation, and the like. A polypeptide can be characterized as a 
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mimetic when all or some of its residues are joined by chemical means other than natural 
peptide bonds. Individual peptidomimetic residues can be joined by peptide bonds, other 
chemical bonds or coupling means, such as, e.g., glutaraldehyde, N-hydroxysuccinimide 
esters, bifunctional maleimides, NjN'-dicycIohexylcarbodiimide (DCC) or N,N'- 
diisopropylcarbodiimide (DIC). Linking groups that can be an alternative to the traditional 
amide bond ("peptide bond") linkages include, e.g., ketomethylene (e.g., -C(=0>CH2- for - 
C(=0)-NH-), aminomethylene (CH 2 -NH), ethylene, olefin (CH=CH), ether (CH 2 -0), 
thioether (CH 2 -S), tetrazole (CN 4 -), thiazole, retroamide, thioamide, or ester (see, e.g., 
Spatola (1983) in Chemistry and Biochemistry of Amino Acids, Peptides and Proteins, Vol. 
7, pp 267-357, "Peptide Backbone Modifications," Marcell Dekker, NY). A polypeptide can 
also be characterized as a mimetic by containing all or some non-natural residues in place of 
naturally occurring amino acid residues; non-natural residues are well described in the 
scientific and patent literature. 

The invention comprises nucleic acids comprising sequences as set forth in 
Table 3, or comprising nucleic acids encoding the polypeptides as set forth in Table 4, 
operably linked to a transcriptional regulatory sequence. As used herein, the term "operably 
linked," refers to a functional relationship between two or more nucleic acid (e.g., DNA) 
segments. Typically, it refers to the functional relationship of a transcriptional regulatory 
sequence to a transcribed sequence. For example, a promoter (defined below) is operably 
linked to a coding sequence, such as a nucleic acid of the invention, if it stimulates or 
modulates the transcription of the coding sequence in an appropriate host cell or other 
expression system. Generally, promoter transcriptional regulatory sequences that are 
operably linked to a transcribed sequence are physically contiguous to the transcribed 
sequence, they are m-acting. However, some transcriptional regulatory sequences, such 
as enhancers, need not be physically contiguous or located in close proximity to the coding 
sequences whose transcription they enhance. For example, in one embodiment, a promoter is 
operably linked to an ORF-containing nucleic acid sequence of the invention, as exemplified 
by, e.g., a nucleic acid sequence as set forth in Table 3. 

As used herein, the term "promoter" includes all sequences capable of driving 
transcription of a coding sequence in an expression system. Thus, promoters used in the 
constructs of the invention include ay-acting transcriptional control elements and regulatory 
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sequences that are involved in regulating or modulating the timing and/or rate of 
transcription of a nucleic acid of the invention. For example, a promoter can be a cis-acting 
transcriptional control element, including an enhancer, a promoter, a transcription terminator, 
an origin of replication, a chromosomal integration sequence, 5' and 3* untranslated regions, 
or an intronic sequence, which are involved in transcriptional regulation. These exacting 
sequences typically interact with proteins or other biomolecules to cany out (turn on/off, 
regulate, modulate, etc*) transcription. 

The invention comprises expression cassettes comprising nucleic acids 
comprising sequences as set forth in Table 3, or comprising nucleic acids encoding the 
polypeptides as set forth in Table 4, The term "expression vector" refers to any recombinant 
expression system for the purpose of expressing a nucleic acid sequence of the invention in 
vitro or in vivo, constitutively or inducibly, in any cell, including prokaryotic, yeast, fungal, 
plant, insect or mammalian cell The term includes linear or circular expression systems. 
The term includes expression systems that remain episomal or integrate into the host cell 
genome. The expression systems can have the ability to self-replicate or not, drive only 
transient expression in a cell The term includes recombinant "expression cassettes" which 
contain only the minimum elements needed for transcription of the recombinant nucleic acid. 

Alignment Analysis of Sequences 

The nucleic acid and polypeptide sequences of the invention include genes 
and gene products identified and characterized by sequence identify analysis (i.e., by 
homology) using the exemplary nucleic acid and protein sequences of the invention, 
including, e.g., those set forth in Tables 3 and 4. In alternative aspects of the invention, 
nucleic acids and polypeptides within the scope of the invention include those having 98%, 
95%, 90%, 85% or 80% sequence identity (homology) to the exemplary sequences as set 
forth in Tables 3 and 4. 

For sequence comparison, typically one sequence acts as a reference 
sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence coordinates 
are designated, if necessary, and sequence algorithm program parameters are designated. 
Default program parameters are used unless alternative parameters are designated herein. 
The sequence comparison algorithm then calculates the percent sequence identity for the test 
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sequences) relative to the reference sequence* based on the designated or default program 
parameters. A "comparison window", as used herein, includes reference to a segment of any 
one of the number of contiguous positions selected from the group consisting of from 25 to 
600, usually about SO to about 200, more usually about 100 to about 150 in which a sequence 

5 may be compared to a reference sequence of the same number of contiguous positions after 
the two sequences are optimally aligned. Methods of alignment of sequences for comparison 
are well-known in the art* Optimal alignment of sequences for comparison can be conducted, 
e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), 
by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), 

10 by the search for similarity method of Pearson & Lipman, Proc. Natl. Acad. Sci. USA 

85:2444 (1988), by computerized implementations of these algorithms (CLUSTAL, GAP, 
BESTFIT, F ASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics 
Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual 
inspection. 

15 In one aspect of the invention (in the methods of the invention, and, to 

determine if a sequence is within the scope of the invention), a CLUSTAL algorithm is used, 
e.g., the CLUSTAL W program, see, e.g., Thompson (1994) Nuc. Acids Res. 22:4673-4680; 
Higgins (1996) Methods Enzymol 266:383-402. Variations can also be used, such as 
CLUSTAL X, see Jeanmougin (1998) Trends Biochem Sci 23:403-405; Thompson (1997) 

20 Nucleic Acids Res 25:4876-4882. In one aspect, the CLUSTAL W program described by 
Thompson (1994) supra, is used with the following parameters: K tuple (word) size: 1, 
window size: 5, scoring method: percentage, number of top diagonals: 5, gap penalty: 3, to 
determine whether a nucleic acid has sufficient sequence identity to an exemplary sequence 
to be with the scope of the invention. In another aspect, the algorithm PILEUP is used in the 

25 methods and to determine whether a nucleic acid has sufficient sequence identity to be with 
the scope of the invention. This program creates a multiple sequence alignment from a group 
of related sequences using progressive, pairwise alignments to show relationship and percent 
sequence identity. It also plots a tree or dendogram showing the clustering relationships used 
to create the alignment. PILEUP uses a simplification of the progressive alignment method 

30 of Feng & Doolittle, J. Mol. Evol. 35:351-360 (1987). The method used is similar to the 
method described by Higgins & Sharp, CABIOS 5:151-153 (1989). Using PILEUP, a 
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reference sequence (e.g., an exemplary GCA-associated sequence of the invention) is 
compared to another sequence to determine the percent sequence identity relationship (i.e., 
that the second sequence is substantially identical and within the scope of the invention) 
using the following parameters: default gap weight (3 .00), default gap length weight (0.10), 

5 and weighted end gaps. In one embodiment, PILEUP obtained from the GCG sequence 

analysis software package, e.g., version 7.0 (Devereaux(1984) Nua Acids Res. 12:387-395), 
using the parameters described therein, is used in the methods and to identify nucleic acids 
within the scope of the invention. In a another aspect, a BLAST algorithm is used (in the 
methods, e.g., to determine percent sequence identity (f.e., substantial similarity or identity) 

10 and whether a nucleic acid is within the scope of the invention), see, e.g., Altschul (1990) J. 
Mol Biol 215:403-410. Software for performing BLAST analyses is publicly available 
through the National Center for Biotechnology Information, NIH. This algorithm involves 
first identifying high scoring sequence pairs (HSPs) by identifying short words of length W 
in the query sequence, which either match or satisfy some positive-valued threshold score T 

1 5 when aligned with a word of the same length in a database sequence, T is referred to as the 
neighborhood word score threshold (Altschul (1990) supra). These initial neighborhood 
word hits act as seeds for initiating searches to find longer HSPs containing them. The word 
hits are then extended in both directions along each sequence for as far as the cumulative 
alignment score can be increased. Cumulative scores are calculated using, for nucleotide 

20 sequences, the parameters M (reward score for a pair of matching residues; always > 0) and 
N (penalty score for mismatching residues, always < 0). For amino acid sequences, a scoring 
matrix is used to calculate the cumulative score* Extension of the word hits in each direction 
are halted when: the cumulative alignment score falls off by the quantity X from its 
maximum achieved value; the cumulative score goes to zero or below, due to the 

25 accumulation of one or more negative-scoring residue alignments; or the end of either 
sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. In one embodiment, to determine if a nucleic acid 
sequence is within the scope of the invention, the BLASTN program (for nucleotide 
sequences) is used incorporating as defaults a wordlength (W) of 1 1, an expectation (E) of 

30 1 0, M=5, N=4, and a comparison of both strands, For amino acid sequences, the BLAST? 
program uses as default parameters a wordlength (W) of 3, an expectation (E) of 10, and the 
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BLOSUM62 scoring matrix (see, e.g., Henikoff (1989) Proc. Natl. Acad. Sci. USA 
89:10915). 

Hybridization for Identifying Nucleic Acids of the Invention 

Nucleic acids within the scope of the invention include isolated or 
recombinant nucleic acids that specifically hybridize under stringent hybridization conditions 
to an exemplary nucleic acid of the invention (including a sequence encoding an exemplary 
polypeptide) as set forth in Tables 3 and 4, Stringent conditions are sequence-dependent and 
will be different in different circumstances. Longer sequences hybridize specifically at 
higher temperatures. An extensive guide to the hybridization of nucleic acids is found in, 
e.g., Tijssen (1993) infira. Generally, stringent conditions are selected to be about 5 to 10°C 
lower than the thermal melting point (Tm) for the specific sequence at a defined ionic 
strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic 
acid concentration) at which 50% of the probes complementary to the target hybridize to the 
target sequence at equilibrium (as the target sequences are present in excess, at Tm, 50% of 
the probes are occupied at equilibrium). Stringent conditions will be those in which the salt 
concentration is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion 
concentration (or other salts) at pH 7.0 to 8,3 and the temperature is at least about 30°C for 
short probes (e.g., 10 to 50 nucleotides) and at least about 60°C for long probes (e.g., greater 
than 50 nucleotides). Stringent conditions may also be achieved with the addition of 
destabilizing agents such as formamide. 

For selective or specific hybridization, a positive signal (e.g., identification of 
a nucleic acid of the invention) is about 10 times background hybridization. "Stringent" 
hybridization conditions that are used to identify substantially identical nucleic acids within 
the scope of the invention include hybridization in a buffer comprising 50% formamide, 5x 
SSC, and 1 % SDS at 42°C, or hybridization in a buffer comprising 5x SSC and 1% SDS at 
65°C, both with a wash of 0.2x SSC and 0.1% SDS at 65°C Exemplary "moderately 
stringent hybridization conditions" include a hybridization in a buffer of 40% formamide, 1 
M NaCl, and 1% SDS at 37°C, and a wash in IX SSC at 45°C. Those of ordinary skill will 
readily recognize that alternative but comparable hybridization and wash conditions can be 
utilized to provide conditions of similar stringency. Nucleic acids which do not hybridize to 
each other under stringent hybridization conditions are still substantially identical if the 
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polypeptides which they encode are substantially identical. This may occur, e.g. , when a 
copy of a nucleic acid is created using the maximum codon degeneracy permitted by the 
genetic code, as discussed herein (see discussion on "conservative substitutions"). However, 
the selection of a hybridization format is not critical - it is the stringency of the wash 

5 conditions that set forth the conditions that determine whether a nucleic acid is within the 
scope of the invention. Wash conditions used to identify nucleic acids within the scope of 
the invention include, e.g.: a salt concentration of about 0.02 molar at pH 7 and a temperature 
of at least about 50°C or about 55°C to about 60°C; or, a salt concentration of about 0.15 M 
NaCl at 72°C for about 15 minutes; or, a salt concentration of about 0.2X SSC at a 

10 temperature of at least about 50°C or about 55°C to about 60°C for about 1 5 to about 20 
minutes; or, the hybridization complex is washed twice with a solution with a salt 
concentration of about 2X SSC containing 0.1% SDS at room temperature for 15 minutes 
and then washed twice by 0.1X SSC containing 0.1% SDS at 68°C for 15 minutes; or, 
equivalent conditions. See Sambrook, Tijssen and Ausubel (see below) for a description of 

15 SSC buffer and equivalent conditions. 

General Techniques 

The nucleic acid and polypeptide sequences of the invention and other nucleic 

acids used to practice this invention, whether RNA, cDNA, genomic DNA, vectors, viruses 

or hybrids thereof, may be isolated from a variety of sources, genetically engineered, 
20 amplified, and/or expressed recombinandy. Any recombinant expression system can be 

used, including, in addition to bacterial cells, e.g., mammalian, yeast, insect or plant cell 

expression systems. 

Alternatively, these nucleic acids and polypeptides can be synthesized in vitro 

by well-known chemical synthesis techniques, as described in, e.g., Carruthers (1982) Cold 
25 SpringHarbor Symp. Quant Biol. 47:41 1-418; Adams (1983) J. Am. Chcm. Soc. 105:661; 

Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. Biol. Med* 

19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) Meth, Enzymol. 

68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. Lett, 22:1859; U.S. 

Patent No. 4,458,066. 

30 Techniques for the manipulation of nucleic acids, such as, e.g., generating 

mutations in sequences, subcloning, labeling probes, sequencing, hybridization and the like 
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are well described in the scientific and patent literature, see, e.g., Sambrook, e&, 
Molecular Cloning: a Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring Harbor 
Laboratojy, (1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Ausubel, ed. John Wiley 
& Sons, Inc., New York (1997); Laboratory Techniques in Biochemistry and 

5 Molecular Biology: Hybridization With Nucleic Acid Probes, Part L Theoiy and 
Nucleic Acid Preparation, Tijssen, ed. Elsevier, N.Y. (1993), 

Polypeptides and peptides of the invention can also be synthesized, whole or 
in part, using chemical methods well known in the art. See e.g., Caruthers (1980) Nucleic 
Acids Res. Symp. Ser. 215-223; Horn (1980) Nucleic Acids Res. Symp. Sen 225-232; 

10 Banga, A.K., Therapeutic Peptides and Proteins, Formulation, Processing and Delivery 

Systems (1995) Technomic Publishing Co., Lancaster, PA. For example, peptide synthesis 
can be performed using various solid-phase techniques (see e.g., Roberge (1995) Science 
269:202; Merrifield (1997) Methods Enzymol. 289:3-13) and automated synthesis may be 
achieved, e.g., using the ABI 431 A Peptide Synthesizer (Perkin Elmer) in accordance with 

15 the instructions provided by the manufacturer. 

The skilled artisan will recognize that individual synthetic residues and 
polypeptides incorporating mimetics can be synthesized using a variety of procedures and 
methodologies, which are well described in the scientific and patent literature, e.g., Organic 
Syntheses Collective Volumes, Gilman, et al (Eds) John Wiley & Sons, Inc., NY* 

20 Polypeptides incorporating mimetics can also be made using solid phase synthetic 

procedures, as described, e.g., by Di Marchi, et al., U.S. Pat No. 5,422,426. Peptides and 
peptide mimetics of the invention can also be synthesized using combinatorial 
methodologies. Various techniques for generation of peptide and peptidomimetic libraries 
are well known, and include, e.g., multipin, tea bag, and split-couple-mix techniques; see, 

25 e.g., al-Obeidi (1998) Mol. Biotechnol. 9:205-223; Hruby (1997) Curr. Opin. Chem. Biol. 
1:1 14*1 19; Ostergaard (1997) Mol. Divers. 3:17-27; Ostresh (1996) Methods Enzymol. 
267:220*234. Modified peptides of the invention can be further produced by chemical 
modification methods, see, e.g., Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel 
(1995) Free Radic. Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896. 

30 Peptides and polypeptides of the invention can also be synthesized and 

expressed as fusion proteins with one or more additional domains linked thereto for, e.g., 
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producing a more immunogenic peptide, to more readily isolate a recombinantly synthesized 
peptide, to identify and isolate antibodies and antibody-expressing B cells, and the like. 
Detection and purification facilitating domains include, e.g., metal chelating peptides such as 
polyhistidine tracts and histidine-tryptophan modules that allow purification on immobilized 

5 metals, protein A domains that allow purification on immobilized immunoglobulin, and the 
domain utilized in the FLAGS extension/affinity purification system (Iramunex Corp, Seattle 
WA). The inclusion of a cleavable linker sequences such as Factor Xa or enterokinase 
(Invitrogen, San Diego CA) between the purification domain and GCA-associated peptide or 
polypeptide can be useful to facilitate purification. For example, an expression vector can 

10 include an epitope-encoding nucleic acid sequence linked to six histidine residues followed 
by a thioredoxin and an enterokinase cleavage site (see e.g., Williams (1995) Biochemistry 
34:1787-1797; Dobeli (1998) Protein Expr. Purif. 12:404-414). The histidine residues 
facilitate detection and purification while the enterokinase cleavage site provides a means for 
purifying the epitope from the remainder of the fusion protein. Technology pertaining to 

15 vectors encoding fusion proteins and application of fusion proteins are well described in the 
scientific and patent literature, see e.g., Kroll (1993) DNA Cell. Biol., 12:441-53. 

The invention provides antibodies that specifically bind to the polypeptides of 
the invention, as set forth in Table 4. These antibodies can be useful in the screening 
methods of the invention. The polypeptides or peptide can be conjugated to another 

20 molecule or can be administered with an adjuvant. The coding sequence can be part of an 
expression cassette or vector capable of expressing the immunogen in vivo, (see, e.g., 
Katsumi (1994) Hum. Gene Ther. 5:1335-9). Methods of producing polyclonal and 
monoclonal antibodies are known to those of skill in the art and described in the scientific 
and patent literature, see, e.g., Coligan, Current Protocols in Immunology, 

25 Wiley/Greene, NY (1991); Stites (eds.) Basic and Clinical Immunology (7th ed.) Lange 
Medical Publications, Los Altos, CA; Goding, Monoclonal Antibodies: Principles and 
Practice (2d ed.) Academic Press, New York, NY (1986); Harlow (1988) Antibodies, a 
Laboratory Manual, Cold Spring Harbor Publications, New York* 

Antibodies also can be generated in vitro, e.g., using recombinant antibody 

30 binding site expressing phage display libraries, in addition to the traditional in vivo methods 
using animals. See, e.g., Huse (1989) Science 246:1275; Ward (1989) Nature 341 :544; 
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Hoogenboom (1997) Trends BiotechnoL 15:62-70; Katz (1997) Annu. Rev, Biophys. 
BiomoL Struct. 26:27-45. Human antibodies can be generated in mice engineered to produce 
only human antibodies, as described by, e.g., U.S. Patent No. 5,877,397; 5,874,299; 
5,789,650; and 5,939,598. B-cells from these mice can be immortalized using standard 
5 techniques (e.g., by fusing with an immortalizing cell line such as a myeloma or by 
manipulating such B-cells by other techniques to perpetuate a cell line) to produce a 
monoclonal human antibody-producing cell. See, e.g., U.S. Patent No* 5,916,771; 5,985,615. 

TABLE 3 

>Rv0002 dnaN DNA polymerase III* b-subunft TB.seq 2052:3257 MW:42114 

10 >emb|AL1 23456|MTBH37RV:2052-3260, dnaN SEQ ID NO:1 

ATGGAC6CGGCTACGACAAGAGTTGGCCTCACCGACTTGACGTTTCGTTTGCTACGAGAGTCTT 
TCGCCGATGCGGTGTCGTGGGTGGCTAAAAATCTGCCAGCCAGGCCCGCGGTGCCGGTGCTCT 
CCGGCGTGTTGTTGACCGGCTCGGACMCGGTCTGACGATTTCCGGATTCGACTACGAGGTTTC 
CGCCGAGGCCCAGGTTGGCGCTGAMTTGTTTCT 

15 TTGTCCGATATTACCCGGGCGTTGCCTAACAAGCCCGTAGACGTTCATGTCGAAGGTAACCGGG 
TCGCATTGACCTGCGGTAACGCCAGGTTTTCGCTACCGACGATGCGAGTCGAGGATTATCCGAC 
GCTGCCGACGCTGCCGGAAGAGACCGGATTGTTGCCTGCGGAATTATTCGCCGAGGCAATCAG 
TCAGGTCGCTATCGCCGCCGGCGGGGACGACACGTTGCCTATGTTGACCGGCATCCGGGTCGA 
AATCCTCGGTGAGACGGTGGTTTTGGCCGCTACCGACAGGTTTCGCCTGGCTGTTGGAGAACTG 

20 AAGTGGTCGGCGTCGTCGCCAGATATCGAAGCGGCTGTGCTGGTCCCGGCCAAGACGCTGGC 
CGAGGCGGCCAMGCGGGCATCGGCGGCTCTGACGTTCGTTTGTCGTTGGGTACTGGQCCGG 
GGGTGGGCAAGGATGGCCTGCTCGGTATCAGTGGGAACGGCAAGCGCAGCACCACGCGACTT 
CTTGATGCCGAGTTCCCGAAGTTTCGGCAGTTGCTACCAACCGAACACACCGCGGTGGCCACC 
ATGGACGTGGCCGAGTTGATCGAAGCGATCAAGCTGGTTGCGTTGGTAGCTGATCGGGGCGCG 

25 CAGGTGCGCATGGAGTTCGCTGATGGCAGCGTGCGGCTTTCTGCGGGTGCCGATGATGTTGGA 
CGAGCCGAGGAAGATCTTGTTGTTGACTATGCCGGTGAACCATTGACGATTGCGTTTAACCCAA 
CCTATCTAACGGACGGTTTGAGTTCGTTGCGCTCGGAGCGAGTGTCTTTCGGGTTTACGACTGC 
GGGTAAGCCTGCCTTGCTACGTCCGGTGTGCGGGGACGATCGCCCTGTGGCGGGTCTGAATGG 
CAACGGTCCGTTCCCGGCGGTGTCGAGGGACTATGTCTATCTGTTGATGCCGGTTCGGTTGCCG 

30 GGCTGA 

>Rv0003 recF DNA replication and SOS induction TB.seq 3280:4434 MW:42 181 
>emb|AL123456|MTBH37RV:3280-4437, recF SEQ ID NO.2 

GTGTACGTCCGTCATTTGGGGCTGCGTGACTTCCGGTCCTGGGCATGTGTAGATCTGGAATTGC 
35 ATCCAGGGCGGACGGTTTTTGTTGGGCCTMCGGTTATGGTAAGACGAATCTTATTGAGGCACT 
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GTGGTATrCGACGACGTTAGGTTCGCACCGCGTTAGCGCCGATTTGCCGTTQATCCGGOTAGGT 

ACCGATCGTGCGGTGATCTCCACGATCGTGGTGAACGACGGTAGAGAATGTGCCGTCGACCTC 

GAGATCGCCACGGGGCGAGTCAACAAAGCGCGATTGAATCGATCATCGGTCCGAAGTACACGT 

GATGTGGTCGGAGTGCTTCGAGCTGTGTTGTTTGCCCCTGAGGATCTGGGGTTGGTTCGTGGG 

GATCCCGCTGACCGGCGGCGCTATCTGGATGATCTGGCGATCGTGCGTAGGCCTGCGATCGCT 

GCGGTACGAGCCGAATATGAGAGGGTGTTGCGCCAGCGGACGGCGTTATTGAAGTCCGTACCT 

GGAGCACGGTATCGGGGTGACCGGGGTGTGTTTGACACTCTTGAGGTATGGGACAGTCGTTTG 

GCGGAGCACGGGGCTGAACTGGTGGCCGCCCGCATCGATTTGGTCAACCAGTTGGCACCGGA 

AGTGAAGAAGGCATACCAGCTGTTGGCGCCGGAATCGCGATCGGCGTCTATCGGTTATCGGGC 

CAGCATGGATGTAAGCGGTCCCAGCGAGCAGTCAGATATCGATCGGCAATTGTTAGCAGCTCGG 

CTGTTGGCGGCGCTGGCGGCCCGTCGGGATGCCGAACTCGAGCGTGGGGTTTGTCTAGTTGGT 

CCGCACCGTGACGACCTAATACTGCGAGTAGGCGATCAACCCGC6AAAGGATTTGCTAGCCATG 

GGGAGGCGTGGTCGTTGGCGGTGGCACTGCGGTTGGCGGCCTATCAACTGTTACGCGTTGATG 

GTGGTGAGCCGGTGTTGTTGCTCGACGACGTGTTCGCCGMCTGGATGTCATGCGCCGTCGAG 

CGTTGGCGACGGCGGCCGAGTCCGCCGAACAGGTGTTGGTGACTGCCGCGGTGCTCGAGGAT 

ATTCCCGCCGGCTGGGACGCCAGGCGGGTGCACATCGATGTGCGTGCCGATGACACCGGATC 

GATGTCGGTGGTTCTGCCATGA 

>Rv0005 gyrB DNA gyrase subunit B TB.seq ,51 23:7264 MW:78441 
>emb|AL123456|MTBH37RV:5123-7267. gyrB SEQ ID NO:3 

ATGGGTAAAAACGAGGCCAGAAGATCGGCCCTGGCGCCCGATCACGGTACAGTGGTGTGCGAC 

CCCCTGCGGCGACTCAACCGCATGCACGCAACCCCTGAGGAGAGTATTCGGATCGTGGCTGCC 

CAGAAAAAGAAGGCCCAAGACGAATACGGCGGTGCGTCTATCACCATTCTCGAAGGGCTGGAG 

GCCGTCCGCAMCGTCCCGGCATGTACATTGGCTCGACCGGTGAGCK3CGGTTTACACCATCTC 

ATTTGGGAGGTGGTCGACAACGCGGTCGACGAGGCGATGGCCGGTTATGCAACCACAGTGAAC 

GTAGTGCTGCTTGAGGATGGCGGTGTCGAGGTCGCCGACGACGGCCGCGGCATTCCGGTCGC 

CACCCACGCCTCCGGCATACCGACCGTCGACGTGGTGATGACACAACTACATGCCGGCGGCAA 

GTTCGACTCGGACGCGTATGCGATATCTGGTGGTCTGCACGGCGTCGGCGTGTCGGTGGTTAA 

CGCGCTATCCACCCGGCTCGAAGTCGAGATCAAGCGCGACGGGTACGAGTGGTCTCAGGTTTA 

TGAGAAGTCGGAACCCCTGGGCCTCAAGCAAGGGGCGCCGACCAAGAAGACGGGGTCAACGG 

TGCGGTTCTGGGCCGACCCCGCTGTTTTCGAAACGACGGAATAGGACTTCGAAACCGTCGCCC 

GCCGGCTGCAAGAGATGGCGTTCCTCAACAAGGGGCTGACCATCAACCTGACCGACGAGAGGG 

TGACCCAAGACGAGGTCGTCGACGAAGTGGTCAGCGACGTCGCCGAGGCGCCGAAGTCGGCA 

AGTGAACGCGCAGCCGAATCCACTGCACCGCACAAAGTTAAGAGCCGCACCTTTCACTATCCGG 

GTGGCCTGGTGGACTTCGTGAAACACATCAACCGCACCAAGAACGCGATTCATAGCAGCATCGT 

GGACTTTTCCGGCAAGGGCACCGGGCACGAGGTGGAGATCGCGATGCAATGGAACGCCGGGT 

ATTCGGAGTCGGTGCACACCTTCGCCAACACCATCAACACCCACGAGGGCGGCACCCACGAAG 
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AGGGCTTCCGCAGC6CGCTGACGTCGGTGGTGAACAAGTACGCCAAGGACCGCAAGCTACTGA 

AGGACAAGGACCCCAACCTCACCGGTGACGATATCCGGGAAGGCCTGGCCGCTGTGATCTCGG 

TGAAGGTCAGCGAACCGCAGTTCGAGGGCCAGACCAAGACCAAGTTGGGCAACACCGAGGTCA 

AATCGTTTGTGCAGAAGGTCTGTAACGAACAGCTGACCCACTGGTTTGAAGCCAACCCCACCGA 

CGCGAAAGTCGTTGTGAACAAGGCTGTGTCCTCGGCGCAAGCCCGTATCGCGGCACGTAAGGC 

ACGAGAGTTGGTGCGGCGTAAGAGCGCCACCGACATCGGTGGATTGCCCGGCAAGCTGGCCG 

ATTGCCGTTCCACGGATCCGCGCAAGTCCGAACTGTATGTCGTAGAAGGTGACTCGGCCGGCG 

GTTCTGCAAAAAGCGGTCGCGATTCGATGTTCCAGGCGATACTTCCGCTGCGCGGCAAGATCAT 

CAATGTGGAGAAAGCGCGCATCGACCGGGTGCTAAAGAACACCGAAGTTCAGGCGATCATCAC 

GGCGCTGGGCACCGGGATCCACGACGAGTTCGATATCGGCAAGCTGCGCTACCACAAGATCGT 

GCTGATGGCCGACGCCGATGTTGACGGCCMCATATTTCCACGCTGTTGTTGACGTTGTTGTTC 

CGGTTCATGCGGCCGCTGATCGAGAACGGGCATGTGTTTTTGGCACAACCGCCGCTGTACAAAC 

TCAAGTGGCAGCGCAGTGACCCGGAATTCGCATACTCCGACCGCGAGCGCGACGGTCTGCTGG 

AGGCGGGGCTGAAGGCCGGGAAGAAGATCAACAAGGAAGACGGCATTCAGCGGTACAAGGGT 

CTAGGTGAAATGGACGCTAAGGAGTTGTGGGAGACCACCATGGATCCCTCGGTTCGTGTGTTGC 

GTCAAGTGACGCTGGACGACGCCGCCGCCGCCGACGAGTTGTTCTCCATCCTGATGGGCGAGG 

ACGTCGACGCGCGGCGCAGCTTTATCACCCGCAACGCCAAGGATGTTCGGTTCCTGGATGTCTA 

A 

>Rv0006 gyrA DNA gyrase subunit A TB.seq 7302:9815 MW:92276 
>emb|AL123456|MTBH37RV:7302-9818, gyrA SEQ ID N0;4 

ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCGAG 

CAGGAGATGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCG 

GAGGTGCGCGACGGGCTCAAGCCCGTGCATCGCCGGGTGCTCTATGCAATGTTCGA7TCCGGC 

TTCCGCCCGGACCGCAGCCACGCCAAGTCGGCCCGGTCGGTTGCCGAGACCATGGGCAACTA 

CCACCCGCACGGCGACGCGTCGATCTACGACAGCCTGGTGCGCATGGCCCAGCCCTGGTCGC 

TGCGCTACCCGCTGGTGGACGGCCAGGGCAACTTCGGCTCGCCAGGCAATGACCCACCGGCG 

GCGATGAGGTACACCGAAGCCCGGCTGACCCCGTTGGCGATGGAGATGCTGAGGGAAATCGAC 

GAGGAGACAGTCGATTTCATCCCTAACTACGACGGCCGGGTGCAAGAGCCGACGGTGCTACCC 

AGCCGGTTCCCCAACCTGCTGGCCAACGGGTCAGGCGGCATCGCGGTCGGCATGGCAACCAAT 

ATCCCGCCGCACAACCTGCGTGAGCTGGCCGACGCGGTGTTCTGGGCGCTGGAGAATCACGAC 

GGCGACGAAGAGGAGACCCTGGCCGCGGTCATGGGGCGGGTTAAAG6CCCGGACTTCCCGAC 

CGCCGGACTGATC6TCGGATCCCAGGGCACCGCTGATGCCTACAAAACTGGCCGCGGCTCCAT 

TCGAATGCGCGGAGTTGTTGAGGTAGAAGAGGATTCCCGCGGTCGTACCTCGCTGGTGATCAC 

CGAGTTGCCGTATCAGGTCAAGCACGACAACTTGATCACTTCGATCGCCGAACAGGTCCGAGAC 

GGCAAGCTGGCCGGCATTTCCAACATTGAGGACCAGTCTAGCGATCGGGTCGGTTTACGCATC 

GTCATCGAGATCAAGCGCGATGCGGTGGCCAAGGTGGTGATCAATAACCTTTACAAGCACACCC 
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AGCTGCAGACCAGCTTTGGCGCCAACATGCTAGCGATCGTCGACGGGGTGCCGCGCACGCTGC 
GGCTGGACCAGCTGATCCGCTATTACGTTGACCACCAACTCGACGTCATTGTGCGGCGCACCAC 
CTACCGGCTGCGCAAGGCAAACGAGCGAGCCCACATTCTGCGCGGCCTGGTTAAAGCGCTCGA 
CGCGCTGGACGAGGTCATTGCACTGATCCGGGCGTCGGAGACCGTCGATATCGCCCGGGCCG 

6 GACTGATCGAGCTGCTCGACATCGACGAGATCCAGGCCCAGGCAATCCTGGACATGCAGTTGC 
GGCGCCTGGCCGCACTGGAACGCCAGCGCATCATCGACGACCTGGCCAAAATCGAGGCCGAG 
ATCGCCGATCTGGAAGACATCCTGGCAAAACCCGAGCGGCAGCGTGGGATCGTGCGCGACGAA 
CTCGCCGAAATCGTGGACAGGCACGGCGACGACCGGCGTACCCGGATCATCGCGGCCGACGG 
AGACGTCAGCGACGAGGATTTGATCGCCCGCGAGGACGTCGTTGTCACTATCACCGAAACGGG 

10 ATACGCCAAGCGCACCAAGACCGATCTGTATCGCAGCCAGAAACGCGGCGGCAAGGGCGTGCA 
GGGTGCGGGGTTGAAGCAGGACGACATCGTCGCGCACTTCTTCGTGTGCTCCACCCACGATTT 
GATCCTGTTCTTCACCACCCAGGGACGGGTTTATCGGGCCAAGGCCTACGACTTGCCCGAGGC 
CTCCCGGACGGCGCGCGGGCAGCACGTGGCCAACCTGTTAGCCTTCCAGCCCGAGGAACGCA 
TCGCCCAGGTCATCCAGATTCGCGGCTACACCGACGCCCCGTACCTGGTGCTGGCCACTCGCA 

15 ACGGGCTGGTGAAAAAGTCCAAGCTGACCGACTTCGACTCCAATCGCTCGGGCGGAATCGTGG 
CGGTCAACCTGCGCGACAACGACGAGCTGGTCGGTGCGGTGCTGTGTTCGGCCGGCGACGAC 
CTGCTGCTGGTCTCGGCCAACGGGCAGTCCATCAGGTTCTCGGCGACCGACGAGGCGCTGCG 
GCCAATGGGTCGTGCCACCTCGGGTGTGCAGGGCATGCGGTTCAATATCGACGACCGGCTGCT 
GTCGCTGAACGTCGTGCGTGAAGGCACCTATCTGCTGGTGGCGACGTCAGGGGGCTATGCGAA 

20 ACGTACCGCGATCGAGGAATACCCGGTACAGGGCCGCGGCGGTAAAGGTGTGCTGACGGTCAT 
GTACGACCGCCGGCGCGGCAGGTTGGTTGGGGCGTTGATTGTCGACGACGACAGCGAGCTGT 
ATGCCGTCACTTCCGGCGGTGGCGTGATCCGCACCGCGGCACGCCAGGTTCGCAAGGCGGGA 
CGGCAGACCAAGGGTGTTCGGTTGATGAATCTGGGCGAGGGCGACACACTGTTGGCCATCGCG 
CGCAACGCCGAAGAAAGTG6CGACGATAATGCCGTGGACGCCAACGGCGCAGACCAGACGGG 

25 CAATTAA 



>Rv0014c pknB serine-threonine protein kinase TB.seq 15593:17470 MW:66511 
>emb|AL123456|MTBH37RV:c1 7470-15590. pknB SEQ ID NO:5 

ATGACCACCCCTTCCCACCTGTCCGACCGCTACGAACTTGGCGAAATCCTTGGATTTGGGGGCA 
30 TGTCCGAGGTCCACCTGGCCCGCGACCTCCGGTTGCACCGCGACGTTGCGGTCAAGGTGCTGC 
GCGCTGATCTAGCCCGCGATCCCAGTTTTTACCTTCGCTTCCGGCGTGAGGCGCAAAACGCCG 
CGGCATTGAACCACCCTGCAATCGTCGCGGTCTACGACACCGGTGAAGCCGAAACGCCCGCCG 
GGCCATTGCCCTACATCGTCATGGAATACGTCGACGGCGTTACCCTGCGCGACATTGTCCACAC 
CGAAGGGCCGATGACGCCCAAACGCGCCATCGAGGTCATCGCCGACGCCTGCCAAGCGCTGA 
35 ACTTCAGTCATCAGAACGGAATCATCCACCGTGACGTCAAGCCGGCGAACATCATGATCAGCGC 
GACCAATGCAGTAAAGGTGATGGATTTCGGCATCGCCCGCGCCATTGCCGACAGCGGCAACAG 
CGTGACCCAGACCGCAGCAGTGATCGGCACGGCGCAGTACGTGTCACCCGAACAGGCCCGGG 
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GTGATTCCGTCGACGCCCGATGCGATGTCTATTCCTTGGGCTGTGTTCTTTAT6AAGTCCTCACC 
GGGGAGCCACCTTTCACCGGCGACTCACCCGTCTCGGTTGCCTACCAACATGTGCGCGAAGAC 
CCGATCCCACCTTCGGCGCGGCACGAAGGCCTCTCCGCCGACCTGGACGCCGTCGTTCTCAAG 
GCGCTGGCCAAAAATCCGGAAAACCGCTATCAGACAGCGGCGGAGATGCGCGCCGACCTGGTC 
5 CGCGT6CACAACGGTGAGCCGCCCGAGGCGCCCAAAGTGCTCACCGATGCCGAGCGGACCTC 
GCTGCTGTCGTCTGCGGCCGGCAACCTTAGCGGTCCGCGCACCGATCCGGTACCACGCCAGGA 
CTTAGACGACACCGACCGTGACCGCAGCATCGGTTCGGTGGGCCGTTGGGTTGCGGTGGTCGC 
CGTGCTCGCTGTGCTGACCGTCGTGGTAACCATCGCCATCAACACGTTCGGCGGCATCACCCG 
CGACGTTCAAGTTCCCGACGTTCGGGGTCAATCCTCCGCCGACGCCATCGCCACACTGCAAAA 

10 CCGGGGCTTCAAAATCCGCACCTTGCAGAAGCCGGACTCGAGAATCCCACCGGACCACGTTAT 
CGGCACCGACCCGGCCGCCAACACGTCGGTGAGTGCAGGCGACGAGATCACAGTCAACGTGT 
CCACCGGACCCGAGCAACGCGAAATACCCGACGTCTCCACGCTGACATACGCCGAAGCGGTCA 
AGAAACTGACTGCCGCCGGATTCGGCCGCTTGAAGCAAGCGAATTCGCCGTCCACCCCGGAAC 
TGGTGGGCAAGGTCATCGGGACCAACGCGCCAGCCAACCAGACGTCGGGCATCACCAATGTGG 

15 TGATCATCATCGTTGGCTCTGGTCCGGCGACCAAAGACATTCCCGATGTCGCGGGCGAGACCGT 
CGACGT6GCGCAGAAGAACCTCAACGTCTACGGGTTCAGCAAATTCAGTCAGGCCTCGGTGGA 
CAGCCCCCGTCCCGCCGGCGAGGTGACCGGCACCAATCCACCCGCAGGCACCACAGTTCCGG 
TCGATTCAGTCATCGAACTACAGGTGTCCAAGGGCAACCAATTCGTCATGCCCGACCTATCCGG 
CATGTTCTGGGTCGACGCCGAACCACGATTGCGCGCGCTGGGCTGGACCGGGATGCTCGACAA 

20 AGGGGCCGACGTCGACGCCGGTGGCTCCCAACACAAGGGGGTCGTCTATCAAAACCCGCCGG 
CGGGGACCGGCGTCAACCGGGACGGCATCATCACGCTGAGGTTCGGCGAGTAG 

>Rv0016cpbpATB,seq 18762:20234 MW:51577 
>emb|AU23456|MTBH37RV:c20234-18759, pbpA SEQ ID NO:6 

25 ATGAACGCCTCTCTGCGCCGAATATCGGTGACCGTGATGGCGTTGATCGTGTTGCTACTGCTCA 
ACGCGACCATGACGCAGGTCTTCACCGCCGACGGGCTGCGTGCCGATCCCCGCAACGAGCGA 
GTGTTGCTCGACGAGTATTCACGGCAGCGCGGCCAGATCACCGCTGGTGGCCAACTGCTGGCG 
TACTCGGTAGCCACCGACGGCCGCTTTGGTTTCCTGCGGGTCTATCCCAATCCTGAGGTGTAGG 
CGCCGGTTACCGGCTTCTACTCCCTGCGCTATTCCAGCACCGCCCTAGAACGAGCCGAGGACC 

30 CGATATrGAACGGGTCCGACCGCCGTCTGTTCGGCCGCCGGCTGGCCGACrrCTTCACCGGTC 
GCGACCCACGCGGCGGTAATGTGGATACCAGGATCAACCCGCGCATTCAGCAAGCCGGCTGGG 
ACGCGATGCAGCAAGGCTGCTACGGGCCCTGTAAGGGAGCGGTGGTCGCCCTTGAGCCATCAA 
CCGGCMGATTTTGGCGTTGGTGTCTTCTCCGTCGTACGACCCCAACCTGCTGGCGTGGCATAA 
CCCCGAGGTGCAGGCGCAAGCCTGGCAGCGGCTTGGCGACAATCCCGCCTCTCCACTGACCAA 

35 CCGTGCCATCTCTGAGACGTATCGACCGGGTTCGACTTTCAAAGTGATCACGACTGCGGCCGCG 
CTGGCGGCCGGGGCCAGGGAGACCGAACAGCTGACTGCGGCGCCCACAATTCCGTTGCCAGG 
CAGCACCGCCCAGCTAGAGAACTACGGCGGTGCGCCGTGCGGGGACGAACCCACCGTGTCGC 
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TGCGTGAQQCATTCGTCAAATCATGCAACACCGCATTCGTCCAGCTGGGCATCCGCACCGGCG 
CCGACGCCCTGCGCAGCATGGCGCGCGCGTTCGGTCTCGATAGCCCACCGCGCCCAACTCCG 
CTGCAAGTGGCGGAATCAACCGTCGGGCCTATCCCGGACAGCGCCGCACTAGGGATGACCAGT 
ATCGGCCAAAAGGACGTTGCGCTGACCCCGCTAGCGAACGCAGAAATAGCCGCGACCATCGCA 
5 AACGGCGGCATTACGATGAGGCCTTATCTAGTCGGCAGCCTCAAGGGACCGGACCTAGCCAAT 
ATCTCAACCACCGTCGGATACCAGCAGCGCCGCGCGGTGTCACCGCAGGTCGCCGCTAAGCTA 
ACAGAGCTGATGGTCGGCGCCGAGAAAGTCGCACAGCAGAAAGGGGCAATCCCCGGCGTGCA 
GATCGCATCCAAGACGGGCACCGCCGAACATGGCACCGACCCTCGTCACACTCCACCGCACGC 
TTGGTACATCGCCTTTGCGCCCGCACAAGCGCCCAAGGTGGCTGTTGCCGTGCTGGTGGAGAA 
10 CGGGGCTGATCGGCTGTCCGCCACCGGAGGTGCCCTCGCGGCACCGATCGGGCGGGCGGTG 
ATCGAAGCCGCACTGCAGGGGGAACCATGA 

>Rv0017c rodA TB.seq 20234:21 640 MW:5061 2 
>emb|AL123456|MTBH37RV:c21640-20231, rodA SEQ ID NO:7 

15 ATGACGACACGACTGCAAGCGCCGGTGGCCGTAACGCCCCCGTTGCCGAGTCGGCGCAACGC 
TGMCTGCTGCTGCTGTGCTTTGCCGCCGTAATCACGTTTGCCGCACTGCTGGTCGTGCAGGCC 
AATCAAGACCAGGGGGTGCCCTGGGACTTGACTAGCTACGGACTGGCCTTCCTGACCCTGTTC 
GGATCCGCGCATCTGGCCATCCGGCGCTTCGCCCCCTACACTGACCCGCTGTTGCTCCCGGTG 
GTGGCACTGCTCAACGGACTTGGCCTGGTAATGATCCACCGCCTCGATCTGGTGGACAACGAG 

20 ATCGGCGAGCATCGGCACCCCAGCGCAAACCAGCAGATGCTGTGGACGCTGGTGGGCGTAGC 
TGCCTTCGCGCTCGTGGTGACCTTCCTCAAGGACCACCGACAGCTCGCACGCTACGGCTACATT 
TGCGGGCTCGCG6GTGTGGTTTTCTTGGCAGTTCCCGCGCTGCTCCCGGCAGCACTGTCCGAA 
CAGAACGGCGCCAAGATCTGGATCCGGTTGCCCGGCTTCTCGATTCMCCCGCCGAATTTTCAA 
AGATTCTGCTGCTGATCTTCTTTTCGGCGGTACTGGTGGCCAAACGCGGCCTGTTCACCAGCGC 

25 CGGCAAACATTTGCTCGGAATGACCCTGCCGCGCCCGCGAGACGTCGCGCCACTGTTGGCAGC 
CTGGGTCATCTCGGTGGGTGTGATGGTCTTCGAGAAAGACCTCGGCGCTTCGCTGCTGCTGTAG 
ACATCGTTTCTGGTGGTGGTTTACCTCGCCACCCAGCGGTTCAGTTGGGTCGTCATCGGCCTGA 
CTCTGTTCGCGGCAGGAACCTTGGTGGCGTACTTCATTTTTGAGCACGTCCGGCTCCGCGTACA 
GACCTGGCTGGATCCGTTCGCAGATCCAGACGGCACCGGATATCAGATCGTGCAGTCGCTTTTC 

30 AGCTTCGCTACAGGCGGTATCTTCGGCACCGGGCTCGGTAATGGTCAACCCGACACCGTGCCC 
GCGGCATCCACCGATTTCATCATCGCCGCGTTCGGCGMGAGCTTGGGTTGGTGGGCTTGACG 
GCCATCCTGATGCTCTACACCATCGTGATCATCCGGGGTTTGCGCACGGCCATCGCCACCCGC 
GATAGCTTCGGCAAGCTGCTGGCCGCCGGCCTCTCATCGACGCTAGCCATTCAGCTGTTCATCG 
TCGTCGGCGGTGTGACCCGACTCATTCCGCTGACCGGGTTGACCACACCGTGGATGTCCTACG 

35 GCGGGTCTTCACTGCTGGCCAACTACATATTGCTGGCCATCCTGGCACGCATCTCGCACGGAGC 
CCGCCGCCCACTGCGCACCCGCCCACGAAATAAGTCGCCGATTACGGCGGCCGGCACCGAGG 
TCATCGAACGCGTATGA 
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>Rv0018c ppp TB.seq 21640:23181 MW.53781 
>emb|AL123456|MTBH37RV:c23l81-21637, ppp SEQ ID N0:8 

GTGGCGCGCGTGACCCTGGTCCTGCGATACGCGGCGCGCAGCGATCGCGGCTTGGTACGCGC 

5 CAACAACGAAGACTCGGTCTACGCTGGGGCACGGCTATTGGCCCTGGCCGACGGCATGGGTG 
GGCATGCGGCCGGCGAGGTGGCGTCCCAGTTGGTGATTGCCGCATTGGCCCATCTCGATGACG 
ACGAGCCCGGTGGCGATCTGCTGGCCAAGCTGGATGCCGCGGTGCGCGCCGGCAACTCGGCT 
ATCGCAGCGCAAGTCGAGATGGAGCCCGATCTCGAAGGCATGGGTACCACGCTCACCGCAATC 
CTGTTCGCGGGCAACCGGCTCGGCCTGGTGCATATCGGTGACTCGCGCGGTTACCTGCTGCGC 

10 GACGGTGAGCTGACGCAGATCACCAAGGACGACACGTTTGTCCAAACGCTG6TCGACGAAGGC 
CGGATCACCCCGGAGGAGGCGCACAGCCACCCGCAACGCTCGTTGATCATGCGGGCGTTGAC 
CGGCCATGAGGTCGAACCGACGCTGACCATGCGAGAAGCCCGCGCCGGTGATCGTTACCTGCT 
GTGCTCGGACGGGTTGTCCGATCCGGTTAGCGATGAAACTATCCTCGAGGCCCTGCAGATCCC 
CGAGGTTGCCGAGAGCGCTCACCGCCTCATTGAACTGGCGCTGCGCGGCGGCGGCCCCGACA 

15 ACGTCACTGTCGTCGTCGCCGACGTCGTCGACTACGACTACGGCCAGACCCAACCGATTCTGG 
CCGGGGCGGTCTCAGGCGACGACGACCAACTGACCCTGCCCAACACCGCCGCCGGCCGGGCC 
TCTGCCATCAGCCAGCGCAAGGAGATCGTTAAACGCGTTCCGCCACAGGCCGATACATTCAGTC 
GGCCACGGTGGTCGGGCCGACGGCTAGCATTCGTTGTCGCACTGGTGACCGTGCTGATGACTG 
CGGGCCTGCTCATTGGTCGCGCGATCATCCGCAGCAACTACTACGTAGCGGACTACGCCGGCA 

20 GCGTGTCCATCATGCGGGGGATTCMGGGTCGCTACTGGGCATGTCCCTGCACCAGCCTTACC 
TGATGGGCTGCCTCAGCCCGCGTAACGAGCTGTCGCAGATCAGCTACGGACAGTCTGGGGGCC 
CTCTCGACTGCCATCTGATGAAACTGGAGGATCTGCGACCGCCGGAGCGCGCACAGGTTCGGG 
CCGGTCTCCCGGCCGGCACTCTCGATGACGCCATCGGGCAGTTGCX3CGAACTGGCGGCCAACT 
CCCTGCTGCCGCCTTGCCCGGCGCCGCGTGCCACGTCCCCGCCCGGGCGCCCGGCCCCACCC 

25 ACGACCAGCGAGACAACCGAACCAAACGTCACCTCCTCGCCAGCCTCTCCATCACCCACCACCT 
CCGCGCCGGCCCCCACCGGAACTACTCCTGCCATCCCCACGAGTGCCTCCCCGGCAGCGCCC 
GCGTCGCCGCCGACGCCTTGGCCCGTCACCAGCTCGCCGACGATGGCCGCACTTCCGCCACC 
CCCGCCTCAGCCGGGCATCGACTGCCGGGCGGCGGCATGA 

30 >Rv0019c- TB.seq 23273:23737 MW:17153 

>emb|AL123456|MTBH37RV:c23737-23270. Rv0019c SEQ ID NO:9 

ATGCAGGGGTTGGTACTGCAACTGACGCGTGCCGGATTCTTGATGTTGTTGTGGGTATTCATCT 
GGTCCGTGCTACGGATCTTGAAGACCGAGATTTATGCGCCGACCGGCGCGGTCATGATGCGCC 
GCGGCCTGGCGCTGCGAGGGACGCTCTTAG6CGCGCGTCAGCGCCGGCACGCTGCACGCTAC 
35 CTGGTGGTGACCGAAGGTGCGTTGACTGGCGCGCGTATCACGCTGAGCGAACAGCCGGTGTTG 
ATCGGGCGCGCCGACGACTCGACCCTGGTGCTGACCGACGACTACGCCTCGACGCGGCACGC 
TCGGCTGTCTATGCGCGGCTCCGAGTGGTACGTCGAAGATCTAGGATCGACCAACGGCACTTA 
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CCT66ACAGGGCGAAGGTGACGACTGCGGTACGAGTTCCGATCGGAACGCCGGTTCGCATCG 
GCAAAACTGCAATCGAGTTGCGCCCGTGA 

>Rv0020c • TB.seq 23864:25444 MW:56881 
S >emb|AL123456|MTBH37RV:c25444-23861, Rv0O20c SEQ ID NO:10 

ATGGGTAGCCAGAAAAGGCTGGTTCAGCGCGTTGAGCGCAAACTCGAGCAGACGGTTGGCGAT 
GCGTTTGCCCGCATCTTTGGAGGCTCGATCGTCCCGCAAGAGGTCGAAGCCCTGCTGCGCCGC 
GAGGCGGCCGACGGCATCCAGTCGCTGCAGGGAAATCGCCTTTTGGCGCCCAACGAATACATC 
ATTACCCTCGGTGTGCACGACTTTGAGAAGTTGGGCGCTGATCCTGAGCTGMGTCAACCGGTT 
10 TTGCTCGGGACTTGGCGGACTATATCCAAGAACAGGGGTGGCAAACGTATGGTGATGTGGTCGT 

ccgattcgagcagtcgtcgaacctgcataccggccagttccgcgcccgcggcactgttaaccc 
cgacgttgagacccacccgccggtcatcgattgcgcccggccacaatcaaaccacgcgtttgg 
cgcagaaccaggagtagcaccaatgagtgacaattcgagctaccgtggcggtcaggggcaggg 
gcgtcccgacgagtattacgacgaccgctatgcgcgtccgcaagaggatccgcgtggtggccc 

15 ggatccgcaaggcggatctgacccccgcggggggtatccacccgagacgggcggctacccgc 
cccagccgggctagccacgcccgcgccaccx;ggaccagggcgactaccccgagcaaatcggg 
taccccgaccagggcggttaccccgagcaacgcggttaccccgagcaacgcggctaccccga 
ccagcgcgggtaccaggaccagggtcgaggctaccccgaccaagggcaggggggctatccgc 
cgccctacgagcaacgccctcctgtttctcccggcccggctgccggctacggcgctcccggct 

20 acgaccagggctatcgccaaagcggcggctacggcccttcacccggtggcggccagcccggc 
tacggcgggtacggggagtacgggcgtggcccggctcgccacgaggagggcagctatgtgcc 
ctctggccctccgggcccgcccgagcaacgaccggcttaccccgaccaaggcggttacgacc 
agggctaccagcaaggcgccacgacatacggccggcaagactatggcggcggcgctgactac 
acccgctacaccgaatccccgcgggtcccgggatacgctcctcagggtggcgggtacgccga 

25 acccgccggccgagactacgactacggccaatcaggcgctccggactacggtcagccagcgc 
ccggtggctacagcggttacgggcagggcggctatgggtccgccggaacgtcggttacgctg 
cagctcgacgacggcagcggacgcac7taccagctccgcgagggctccaacatcatcggtcgc 
ggacaggacgcccagttccggctgcccgacaccggtgtgtcacgccgtcacttggagatccg 
gtgggacgggcaggtcgcattgctcgcagacctgaactccaccaacggcaccactgttaacaa 

30 tgcaccggtacaggagtggcagttggccgacggtgatgtgatccgcttgggacactccgagat 
catcgtccgcatgcactga 

>Rv0032 bloF2 C-terminal similar to B. subtilis BioF TB.seq 34295:36607 MW:86245 
>emb|AL123456|MTBH37RV:34295-36610, bioF2 SEQ ID NO:1 1 
35 ATGCCCACTGGCTTGGGCTATGACTTTCTGCGCCCTGTCGAGGACTCGGGGATCAACGACCTGA 
AGCACTATTACTTCATGGCGGATTTGGCCGATGGGCAACCGCTAGGCCGGGCAAACCTCTATAG 
CGTCTGTTTCGACCTGGCCACCACCGACCGCAAGCTCACTCCGGCCTGGCGAACGACCATCAA 
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ACGGTGGTTTCCGGGGTTTATGACCTTCCGTTTCCTCGAGTGCGGGTTGCTCACCATGGTGAGC 
AACCCGCTGGCGTTGCGGTCCGACACCGACTTGGAGCGGGTATTGCCTGTGCTGGCCGGCCAG 
ATGGACCAGTTGGCGCATGACGACGGGTCGGATTTCTTGATGATCCGGGACGTGGACCCGGAA 
CACTACCAGCGATACCTTGACATCCTGCGCCCGTTGGGCTTTCGGCCTGCGCTGGGCTTTTCCC 

5 GGGTAGACACGACCATCAGCTGGTCGAGCGTGGAAGAGGCACTGGGCTGCCTGTCTCACAAAA 
GGCGCCTGCCGTTGAAGACGTCGCTGGAGTTTCGTGAGCGGTTCGGTATCGAGGTCGAGGAAC 
TCGACGAGTATGCCGAGCATGCGCCGGTATTGGCCCGGCTTTGGCGCAACGTCAAGACGGAGG 
CAAAGGATTACCAGCGCGAGGACCTGAACCCTGAGTTCTTCGCGGCGTGTTCTCGGCATCTGCA 
TGGACGTAGCAGACTGTGGTTGTTCCGCTACCAGGGCACGCCAATTGCCTTCTTTTTGAACGTTT 

10 GGGGTGCGGATGAGAACTACATACTGCTTGAGTGGGGCATCGATCGTGATTTTGAACATTATAG 
GAAGGCGAATCTGTACCGGGCGGCGCTGATGCTCAGCCTAAAAGATGCGATCAGCCGAGATAA 
ACGGCGAATGGAAATGGGTATTACGAACTATTTCACAAAACTTCGCATTCCGGGTGCCCGAGTC 
ATACCGACCATCTATTTCCTGCGTCACAGCACGGATCCGGTGCATACGGCAACGTTAGCGCGAA 
TGATGATGCACAATATTCAACGGCCAACGCTACCCGACGATATGTCGGAGGAATTCTGTCGCTG 

15 GGAAGAGCGAATACGTCTGGACCAGGACGGGCTACCCGAACACGATATCTTTCGCAAGATCGAT 
CGTCAGCACAAATACACGGGGCTCAAACTCGGCGGAGTCTACGGTTTTTATCCCCGATTCACCG 
GACCGCAGCGATCCACGGTCAAGGCCGCGGAGCTGGGCGAGATCGTGTTGCTGGGCACGAAC 
TCGTATCTGGGCCTGGCCACCCATCCAGAGGTGGTGGAGGCCTCGGCGGAGGCCACGCGACG 
GTACGGCACCGGCTGCTCGGGTTCGCCGTTGCTGAACGGCACGTTGGACTTGCACGTCTCGCT 

20 TGAGCAGGAACTAGCCTGTmTTGGGCAAACCCGCCGCCGTGTTGTGCTCCACCGGATATCAG 
AGCAACCTGGCGGCGATCAGCGCGCTATGCGAATCCGGGGACATGATCATCCAAGACGCGCTG 
AACCACCGCAGCCTGTTCGACGCCGCCAGGTTGTCCGGGGCCGACTTCACCTTGTACCGGCAC 
AACGACATGGACCACCTGGCGCGGGTGCTACGCCGCACCGAGGGGCGCCGCCGGATCATCGT 
CGTGGACGCGGTGTTCAGCATGGAAGGCACCGTCGCCGACCTGGCCACCATCGCCGAGCTTG 

25 CCGACCGGCACGGCTGCCGGGTCTATGTGGACGAGTCCCATGCGCTGGGCGTGCTCGGCCCC 
GACGGGCGAGGAGCTTCGGCCGCGTTGGGTGTCTTGGCGCGCATGGACGTGGTGATGGGCAC 
GTTCAGCAAATCCTTTGCCTCCGTCGGCGGGTTCATCGCCGGAGATCGGCCCGTCGTGGACTA 
CATCGGGCACAACGGTTCAGGTCATGTGTTTTCCGCCAGCCTGCCGCCGGCCGCCGCGGCTGC 
CACCCACGCGGCTCTGCGCGTCAGTCGGCGTGAACCCGACCGGCGGGCTCGGGTGCTGGCCG 

30 CGGCCGAGTACATGGCCACCGGCCTGGCACGGCAGGGCTATCAGGCCGAGTATCACGGAACC 
GCGATCGTGCCGGTGATCCTGGGCAACCCGACCGTGGCGCATGCGGGCTATCTGCGGCTGAT 
GCGCTCCGGGGTGTATGTGAACCCGGTGGCCCCCCCAGCCGTGCCGGAGGAGCGTTCGGGAT 
TCCGCACCAGCTACCTAGCCGACCACCGACAATCTGACCTCGACCGGGCCTTGCACGTGTTTGC 
CGGCCTTGCCGAGGACCTGACCCCGCAAGGAGCCGCGCTATGA 

35 

>Rv0050 ponA1 TB.seq 53661:55694 MW:71119 
>emb|AL123456|MTBH37RV:53661-55697, ponA SEQ ID NO:12 
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GT6GTGATCcVGTTGCCGAfGGTCACCTTC^CGATGGCCT^CCTGATCGTfcGACGTTCCC>\AGC 
CAGGTGACATCCGTACCAACCAGGTCTCCACGATCCTTGCCAGCGACGGCTCGGAAATCGCCA 
AAATTGTTCCGCCCGAAGGTAATCGGGTCGACGTCAACCTCAGCCAGGTGCCGATGCATGTGC 
GCCAGGCGGTGATTGCGGCCGMGACCGCMTTTCTATTCGMTCCGGGATTCTCGTTCACCGG 

5 CTTCGCGCGGGCAGTCAAGAACAACCTGTTCGGCGGCGATCTGCAGGGCGGATCGACGATTAC 
CCAGCAGTACGTCAAGAACGCGCTGGTCGGTTCCGCACAGCACGGGTGGAGCGGTCTGATGC 
pCAAGGCGAAAGAATTGGTCATCGCGACGAAGATGTCGGGGGAGTGGTCTAAAGACGATGTGC 
TGCAGGCGTATCTGAACATCATCTACTTCGGCCGGGGCGCCTACGGCATTTCGGCGGCGTCCA 
AGGCTTATTTCGACAAGCCCGTCGAGCAGCTGACCGTTGCCGAAGGGGCGTTGTTGGCAGCGC 

10 TGATTCGGCGGCCTTCGACGCTGGACCCGGCGGTCGACCCCGAAGGGGCCCATGCCCGCTGG 
AATTGGGTACTCGACGGCATGGTGGAAACCAAGGCTCTCTCGCCGAATGACCGTGCGGCGCAG 
GTGTTTCCCGAGACAGTGCCGCCCGATCTGGCCCGGGCAGAGAATCAGACCAAAGGACCCAAC 
GGGCTGATCGAGCGGCAGGTGACAAGGGAGTTGCTCGAGCTGTTCAACATCGACGAGCAGACC 
CTCAACACCCAGGGGCTGGTGGTCACCACCACGA7TGATCCGCAGGCCCAACGGGCGGCGGA 

15 GMGGCGGTTGCGAAATACCTGGACGGGCAGGACCCCGACATGCGTGCCGCCGTGGTTTCCAT 
CGACCCGCACAACGGGGCGGTGCGTGCGTACTACGGTGGCGACAATGCCAATGGCTTTGACTT 
CGCTCAAGCGGGATTGCAGACTGGATCGTCGTTTAAGGTGTTTGCTCTGGTGGCCGCCCTTGAG 
CAGGGGATCGGCCTGGGCTACCAGGTAGACAGCTCTCCGTTGACGGTCGACGGCATCAAGATC 
ACCAACGTCGAGGGCGAGGGTTGCGGGACGTGCAACATCGCCGAGGCGCTCAAAATGTCGCT 

20 GAACACCTCCTACTACCGGCTGATGCTCAAGCTCAACGGCGGCCCACAGGCTGTGGCCGATGC 
CGCGCACCAAGCCGGCATTGCCTCCAGCTTCCCGGGCGTTGCGCACACGCTGTCCGAAGATGG 
CAAGGGTGGACCGCCCAACAACGGGATCGTGTTGGGCCAGTACCAAACCCGGGTGATCGACAT 
GGCATCGGCGTATGCCACGTTGGCCGCGTCCGGTATCTACCACCCGCCGCATTTCGTACAGAA 
GGTGGTCAGTGCCAACGGCCAGGTCCTCTTCGACGCCAGCACCGCGGACAACACCGGCGATCA 

25 GCGCATCCCCAAGGCGGTAGCCGACAACGTGACTGCGGCGATGGAGCCGATCGCAGGTTATTC 
GCGTGGCCACAACCTAGCGGGTGGGCGGGATTCGGCGGCCAAGACCGGCACTACGCAATTTG 
GTGACACCACCGCGAACAAAGACGCCTGGATGGTCGGGTACACGCCGTCGTTGTCTACGGCTG 
TGTGGGTGGGCACCGTCAAGGGTGACGAGCCACTGGTAACCGCTTCGGGTGCAGCGATTTACG 
GCTCGGGCCTGCCGTCGGACATCTGGAAGGCAACCATGGACGGCGCCTTGAAGGGCACGTCG 

30 MCGAGACTTTCCCCAAACCGACCGAGGTCGGTGGTTATGCCGGTGTGCCGCCGCCGCCGCCG 
CCGCCGGAGGTACCACCTTCGGAGACCGTCATCCAGCCCACGGTCGAAATTGCGCCGGGGATT 
ACCATCCCGATCGGTCCCCCGACCACCATTACCCTGGCGCCACCGCCCCCGGCCCCGCCCGCT 
GCGACTCCCACGCCGCCGCCGTGA 

35 >Rv0051 -TB.seq 55694:57373 MW-.61210 

>emb|AL123456|MTBH37RV:55694-57376, Rv0051 SEQ ID NO:13 
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GTGACCGGCGCGCTGTCCCAAAQCAGCAACATCTCGCCACTTCCTTTGGCCGCCGATCTGCGG 
AGCGCCGATAACCGCGATTGCCCCAGCCGCACCGACGTATTGGGTGCCGCTCTGGCGAATGTC 
GTCGGTGGCCCGGTAGGCCGGCACGCGCTGATCGGCCGCACCCGGCTGATGACCCCGCTGCG 
GGTGATGTTTGCAATCGCGTTGGTGTTCCTGGCGCTCGGTTGGTCGACGAAAGCGGCCTGCTT 

5 GGAGTCCAGCGGAACCGGTCCAGGTGATCAGCGGGTGGCCAACTGGGATAACCAGCGTGGTTA 
CTACCAGTTGTGCTACTCCGATACGGTGCCGCTCTATGGCGCTGAGTTATTGAGCCAAGGCAAG 
TTTCCGTACAAATCAAGCTGGATCGAAACCGACAGCAACGGCACACCGCAGCTGCGCTACGAC 
GGACAGATCGCGGTGCGCTATATGGAGTATCCGGTGCTGACTGGGATCTATCAGTACCTGTCGA 
TGGCGATAGCCAAGACCTACACCGCGTTAAGCAAGGTGGCTCCCCTCCCGGTGGTTGCCGAAG 

10 TGGTGATGTTCTTCAACGTCGCCGCGTTCGGTTTGGCGCTGGCGTGGCTGACAACCGTCTGGG 
CGACCTCGGGCCTGGCCGGCCGCCGGATATGGGATGCGGCGCTGGTGGCCGCCTGACCGCTG 
GTGATCTTTCAGATATTCACCAATTTCGATGCGCTGGCAACGGGTTTGGCGACGAGTGGGCTGC 
TGGCCTGGGCGCGGCGCAGACCGGTGCTTGCCGGTGTGCTGATCGGGTTGGGCTCCGCGGCG 
AAACTGTATCCGCTGTTGTTCTTGTACCCGTTGTTGCTGCTGGGCATCCGGGCCGGTCGCCTGA 

1 5 ATGCTCTGGCCCGCACCATGGCGGCCGCGGCGGCGACCTGGTTGTTGGTGAATCTGCCGGTGA 
TGCTGCTCTTTCCGCGCGGCTGGTCGGAGTTCTTCCGGCTCAACACCCGGCGCGGCGACGACA 
TGGACTCGTTGTACAACGTCGTCAAGTCGTTCACCGGCTGGCGTGGCTTCGACCCCACCCTGG 
GCTTCTGGGAGCGGCCGCTGGTGCTGAACACGGTTGTCACGCTCTTGTTCGTGTTATGTTGTGC 
GGCAATTGCTTACATCGCGCTCACCGCACCCCACCGGCCGCGCGTGGCGCAGCTGACTTTCTT 

20 GACGGTGGCCAGCTTCCTGTTGGTCAACAAGGTGTGGAGTCCCCAGTTCTCGCTTTGGCTGGTG 
CCGCTGGCCGTGCTGGCTTTGCCGCACCGCCGGATCTTGCTGGCGTGGATGACGATCGACGCG 
TTGGTGTGGGTGCCGCGGATGTACTACCTATACGGCAACCCGAGCCGCTCGCTGCCCGAGCAG 
TGGTTCACCACGACGGTGTTGCTGCGTGACATCGCCGTGATGGTGCTGTGCGGACTGGTGGTC 
TGGCAGATCTACCGCCCCGGGCGCGACCTCGTGCGTACCGGCGGGCCAGGGGCACTGCCGGC 

25 TTGTGGGGGAGTCGAGGACCCGGTGGGAGGGGTCTTTGCCAACGCCGCCGACGCCCCGCCAG 
GTCGGCTACCGTCGTGGCTGCGTCCCCGGCTGGGCGACGAGCATGCGCGAGAGAGGACGCCC 
GATGCAGGTCGCGATCGCACTTTTTCCGGGCAACACCGCGCTTGA 

>Rv0106 -TB.seq 124372:125565 MW.43701 

30 >emb|AL1 23456|MTBH37RV:1 24372-125568, Rv0106 SEQIDNO:14 

ATGCGTACTCCGGTGATATTGGTGGCAGGTCAGGATCACAGCGACGAGGTGACGGGCGCCTTG 
TTGCGCCGGAGCGGAACGGTGGTCGTGGAGCACCGGTTTGACGGCCATGTGGTGCGACGGAT 
GACTGCCACGCTGAGCCGTGGCGAATTGATCACCACGGAGGACGCTTTGGAGTTCGCCCACGG 
CTGTGTGTCGTGCACAATCCGCGACGACCTGCTGGTGCTGTTACGCAGACTGCACCGCCGAGA 

35 CAATGTCGGCCGGATCGTCGTGCACCTGGCGCCGTGGCTGGAGCCCCAGCCCATCTGCTGGG 
CGATCGACCACGTGCGGGTTTGCGTCGGACACGGATACCCAGACGGACCAGCCGCCCTCGAC 
GTGCGGGTCGCGGCCGTGGTGACCTGTGTGGACTGCGTAAGGTGGCTGCCGCAGTCACTCGG 
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CGAGGACGAACTGCCCGACGGGCGCACGGTGGCCCAAGTGACGGTCGGTCAGGCCGAGTTCG 
CCGACCTTCTGGTGCTGACCCACCCGGAACCGGTCGCCGTGGCGGTTCTGCGCCGACTGGCC 
CCTCGAGCGCGAATCACCGGCGGCGTC6ACCGCGTCGAGCTGGCGCTGGCGCATCTGGACGA 
CAACTCACGGAGGGGTCGTACCGATACCCCGCACACGCCATTGCTGGCGGGCCTGCCTCCGTT 

5 GGCAGCCGACGGTGAGGTTGCGATCGTGGAATTCAGTGCCCGCCGCCCGTTTCACCCGCAACG 
TCTGCATGCCGCGGTTGACCTGCTGCTCGATGGCGTGGTTCGCACTCGAGGTCGGCTGTGGCT 
GGCCAACCGGCCGGATCAGGTCATGTGGCTCGAATCAGCCGGTGGCGGTCTGCGGGTCGCAT 
CGGCCGGAAAGTGGTTGGCGGCGATGGCGGCCTCGGAGGTGGCCTATGTCGACCTGGAGCGG 
CGGTTGTTCGCCGACCTGATGTGGGTCTACCCGTTCGGAGACCGGCACACCGCGATGACGGTA 

10 CTGGTATGCGGCGCCGATCCGACCGACATCGTCAATGCCCTGAACGCGGCGCTGCTCAGCGAC 
GACGAAATGGCATCTCCGCAACGCTGGCAGTCCTACGTCGACCCTTTCGGCGACTGGCATGAC 
GACCCGTGCCACGAAATGCCCGATGCGGCTGGGGAATTCTCGGCACACCGCAACTCAGGAGAA 
TCTCGATGA 

15 >Rv0125 -TB.seq 151146:152210 MW:34927 

>emb|AL123456|MTBH37RV:151146-1522l3, pepA SEQ ID N0:15 

ATGAGCAATTCGCGCCGCCGCTCACTCAGGTGGTCATGGTTGCTGAGCGTGCTGGCTGCCGTC 
GGGCTGGGCCTGGCCACGGCGCCGGCCCAGGCGGCCCCGCCGGCCTTGTCGCAGGACCGGT 
TCGCCGACTTCCCCGCGCTGCCCCTCGACCCGTCCGCGATGGTCGCCCAAGTGGGGCCACAG 

20 GTGGTCMCATCAACACCAAACTGGGCTACAACAACGCCGTGGGCGCCGGGACCGGCATCGTC 
ATCGATCCCAACGGTGTCGTGCTGACCAACAACCACGTGATCGCGGGCGCCACCGACATCAAT 
GCGTTCAGCGTCGGCTCCGGCCAAACCTACGGCGTCGATGTGGTCGGGTATGACC6CACCCAG 
GATGTCGCGGTGCTGCAGCTGCGCGGTGCCGGTGGCCTGCCGTCGGCGGCGATCGGTGGCG 
GCGTCGCGGTTGGTGAGCCCGTCGTCGCGATGGGCAACAGCGGTGGGCAGGGCGGAACGCC 

25 CCGTGCGGTGCCTGGCAGGGTGGTCGCGCTCGGCCAAACCGTGCAGGCGTCGGATTCGCTGA 
CC6GTGGCGAAGAGACATTGAACGGGTTGATCCAGTTC6AT6CCGCGATCCAGCCCGGTGATT 
CGGGCGGGCCCGTCGTCAACGGCCTAGGACAGGTGGTCGGTATGAACACGGCCGCGTCCGAT 
AACTTCCAGCTGTCCCAGGGTGGGCAGGGATTCGCCATTCCGATCGGGCAGGCGATGGCGATC 
GCGGGCCAGATCCGATCGGGTGGGGGGTCACCCACCGTTCATATCGGGCCTACCGCCTTCCTC 

30 GGCTTGGGTGTTGTCGACAACAACGGCAACGGCGCACGAGTCCAACGCGTGGTCGGGAGCGC 
TCCGGCGGCAAGTCTCGGCATCTCCACCGGCGACGTGATCACCGCGGTCGACGGCGCTCCGAT 
CAACTCGGCCACCGCGATGGCGGACGCGCTTAACGGGCATCATCCCGGTGACGTCATCTCGGT 
GACCTGGCAAACCAAGTCGGGCG6CACGCGTACAGGGAACGTGACATTGGCCGAGGGACCCC 
CGGCCTGA 

35 

>Rv0350 dnaK 70 kD heat shock protein, chromosome replication TB.seq 419833:421707 
MW:66832 SEQ ID NO:16 
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>emb|AL123456|MTBH37RV:419833-421710. dnaK 

ATGGCTCGTGCGGTCGGGATCGACCTCGGGACCACCAACTCCGTCGTCTCGGTTCTGGAAGGT 
GGCGACCCGGTCGTCGTCGCCAACTCCGAGGGCTCCAGGACCACCCCGTCAATTGTCGCGTTC 
GCCCGCAACGGTGAGGTGCTGGTCGGCCAGCCCGCCAAGAACCAGGCAGTGACCAACGTCGA 
5 TCGCACCGTGCGCTCGGTCAAGCGACACATGGGCAGCGACTGGTCCATAGAGATTGACGGCAA 
GAAATACACCGCGCCGGAGATCAGCGCCCGCATTCTGATGAAGCTGAAGCGCGACGCCGAGGC 
CTACCTCGGTGAGGACATTACCGACGCGGTTATCACGACGCCCGCCTACTTCAATGACGCCCAG 
CGTCAGGCCACCAAGGACGCCGGCCAGATCGCCGGCCTCAACGTGCTGCGGATCGTCAACGA 
GCCGACCGCGGCCGCGCTGGCCTACGGCCTCGACAAGGGCGAGAAGGAGCAGCGAATCCTGG 

10 TCTTCGACTTGGGTGGTGGCAGTTTCGACGTTTCCCTGCTGGAGATCGGCGAGGGTGTGGTTGA 
GGTCCGTGCCACTTCGGGTGACAACCACCTCGGCGGCGACGACTGGGACCAGCGGGTCGTCG 
ATTGGCTGGTGGACAAGTTCAAGGGCACCAGCGGCATCGATCTGACCAAGGACAAGATGGCGA 
TGCAGCGGCTGCGGGAAGCCGCCGAGAAGGCAAAGATCGAGCTGAGTTCGAGTCAGTCCACCT 
CGATCMCCTGCCCTACATCACCGTCGACGCCGACMGMCCCGTTGTTCTTAGACGAGCAGCT 

15 GACCCGCGCGGAGTTCCAACGGATCACTCAGGACCTGCTGGACCGCACTCGCAAGCCGTTCCA 
GTCGGTGATCGCTGACACCGGCATTTCGGTGTCGGAGATCGATCACGTTGTGCTCGTGGGTGG 
TTCGACCCGGATGCCCGCGGTGACCGATCTGGTCAAGGAACTCACCGGCGGCAAGGAACCCAA 
CAAGGGCGTCAACCCCGATGAGGTTGTCGCGGTGGGAGCCGCTCTGCAGGCCGGCGTCCTCA 
AGGGCGAGGTGAAAGACGTTCTGCTGCTTGATGTTACCCCGCTGAGCCTGGGTATCGAGACCA 

20 AGGGCGGGGTGATGACCAGGCTCATCGAGCGCAACACCACGATCCCCACCAAGCGGTCGGAG 
ACTTTCACCACCGCCGACGACAACCAACCGTCGGTGCAGATCCAGGTCTATCAGGGGGAGCGT 
GAGATCGCCGCGCACAACAAGTTGCTCGGGTCCTTCGAGCTGACCGGCATCCCGCCGGCGCC 
GCGGGGGATTCCGCAGATCGAGGTCACTTTCGACATCGACGCCAACGGCATTGTGCACGTCAC 
CGCCAAGGACAAGGGCACCGGCAAGGAGAACACGATCCGAATCCAGGAAGGCTCGGGCCTGT 

25 CCAAGGAAGACATTGACCGCATGATCAAGGACGCCGAAGCGCACGCCGAGGAGGATCGCAAGC 
GTCGCGAGGAGGCCGATGTTCGTAATCAAGCCGAGACATTGGTCTACCAGACGGAGAAGTTCG 
TCAAAGAACAGCGTGAGGCCGAGGGTGGTTCGAAGGTACCTGAAGACACGCTGAACAAGGTTG 
ATGGCGCGGTGGCGGAAGCGAAGGCGGCACTTGGCGGATCGGATATTTCGGCCATCAAGTCG 
GCGATGGAGAAGCTGGGCCAGGAGTCGCAGGCTCTGGGGCAAGGGATCTACGAAGCAGCTCA 

30 GGCTGCGTCACAGGCCACTGGCGCTGCCCACCCCGGCGGCGAGCCGGGCGGTGCCCACCCC 
GGCTCGGCTGATGACGTTGTGGACGCGGAGGTGGTCGACGACGGCCGGGAGGCCAAGTGA 

>Rv0351 grpE stimulates DnaK ATPase activity TB,seq 421707:422411 MW:24501 
>emb|AL123456|MTBH37RV:421707-422414 f grpE SEQ ID N0:17 
35 GTGACGGACGGAAATCAAAAGCCGGATGGCAATTeGGGCGAACAGGTAACCGTCACTGACAAG 
CGGCGGATCGATCCCGAGACGGGTGAAGTGCGGCACGTCCCTCCCGGCGACATGCCGGGAGG 
GACGGCTGCGGCCGATGCGGCGCACACCGAAGACAAGGTCGCCGAGCTGACCGCCGATCTGC 
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AACGCQTGCAGGCCGACTTCGCCAACTACCGTAAGCGGGCGTTGCGCGATCAGCAGGCGGCC 
GCTGACCGAGCCAAGGCCAGCGTTGTCAGCCAATTGCTGGGTGTACTGGACGATCTCGAGCGG 
GCGCGCAAGCACGGCGATTTGGAGTCGGGTCCACTGAAGTCGGTCGCCGACAAGCTAGACAGC 
GCGTTGACCGGGCTGGGTCTGGTGGCGTTCGGTGCCGAGGGCGAGGATTTCGACCCCGTGCT 
5 GCACGAAGCGGTGCAACACGAGGGCGACGGCGGGCAGGGGTCCAAGCCGGTAATCGGCACC 
GTCATGCGGCAGGGCTACCAACTGGGTGAGCAGGTGCTGCGGCACGCCTTGGTCGGCGTCGT 
CGACACGGTGGTCGTCGACGCGGCCGAACTGGAGTCAGTCGACGACGGCACTGCGGTCGCAG 
ATACCGCCGAAAACGATCAAGCTGACCAGGGCAATAGCGCCGACACCTCGGGCGAACAGGCAG 
AATCAGAACCGTCGGGCAGTTAA 

10 

>Rv0352 dnaJ acts with GrpE to stimulate DnaK ATPase TB.seq 422450:423634 MW:41 346 
>emb|AL123456|MTBH37RV:422450-423637, dnaJ SEQ ID NO:18 

ATGGCCCAAAGGGAATGGGTCGAAAAAGACTTCTACCAGGAGCTGGGCGTCTCCTCTGATGCC 
AGTCCTGAAGAGATCAAACGTGCCTATCGGAAGTTGGCGCGCGACCTGCATCCGGACGCGAAC 

1 5 CCGGGCAACCCGGCCGCCGGCGAACGGTTCAAGGCGGTTTCGGAGGCGCATAACGTGCTGTC 
GGATCCGGCCAAGCGCAAGGAGTACGACGAAACCCGCCGCCTGTTCGCCGGCGGCGGGTTCG 
GCGGCCGTCGGTTCGACAGCGGCTTTGGGGGCGGGTTCGGCGGTTTCGGGGTCGGTGGAGAC 
GGCGCCGAGTTCAACCTCAACGACTTGTTCGACGCCGCCAGCCGAACCGGCGGTACCACCATC 
GGTGACTTGTTCGGTGGCTTGTTCGGACGCGGTGGCAGCGCCCGTCCCAGCCGCCCGCGACG 

20 CGGCAACGACCTGGAGACCGAGACCGAGTTGGATTTCGTGGAGGCCGCCAAGGGCGTGGCGA 
TGCCGCTGCGATTAACCAGCCCGGCGCCGTGCACCAACTGCCATGGCAGCGGGGCCCGGCCA 
GGCACCAGCCCAAAGGTGTGTCCCACTTGCAACGGGTCGGGCGTGATCAACCGCAATCAGGGC 
GCGTTCGGCTTCTCCGAGCCGTGCACCGACTGCCGAGGTAGCGGCTCGATCATCGAGCACCCC 
TGCGAGGAGTGCAAAGGCACCGGCGTGACCACCCGGACCCGAACCATCAACGTGC6GATCCC 

25 GCCCGGTGTCGAGGATGGGCAGCGCATCCGGCTAGCCGGTCAGGGCGAGGCCGGGTTGCGC 
GGCGCTCCCTCGGGGGATCTCTACGTGACGGTGCATGTGCGGCCCGACAAGATCTTCGGCCGC 
GACGGCGACGACCTCACCGTGACCGTTCCGGTCAGCTTCACCGAATTGGCTTTGGGCTCGACG 
CTGTCGGTGCCTACCCTGGACGGCACGGTCGGGGTCCGGGTGCCCAAAGGCACCGCTGACGG 
CCGCATTCTGCGTGTGCGCGGACGCGGTGTGCCCAAGCGCAGTGGGGGTAGCGGCGACCTAC 

30 TTGTCACCGTGAAGGTGGCCGTGCCGCCCAATTTGGCAGGCGCCGCTCAGGAAGCTCTGGAAG 
CCTATGCGGCGGCGGAGCGGTCCAGTGGTTTCAACCCGCGGGCCGGATGGGCAGGTAATCGC 
TGA 

>Rv0363c fba fructose bisphosphate aldolase TB.seq 441266:442297 MW.36545 
35 >emb|AL123456|MTBH37RV:c442297^41263 t fba SEQIDN0;19 

ATGCCTATCGCAACGCCCGAGGTCTACGCGGAGATGCTCGGTCAGGCCAAACAAAACTCGTAC 
GCTTrCGCGGCTATCAACTGCACCTCCTCGGAMCCGTCMCGCCGCGATCAMGGTTTCGCCG 
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AC6CCGGCAGTGACGGAATCATCCAGTTCTCGACCGGTG6C6CAGAATTCGQCTCCGGCCTCG 
GGGTCAAAGACATGGTGACCGGTGCGGTCGCCTTGGCGGAGTTCACCCACGTTATCGCGGCCA 
AGTACCCGGTCAACGTGGCGCTGCACACCGACCACTGCCCCAAGGACAAGTTGGACAGCTATG 
TCCGGCCCTTGCTGGCGATCTCGGCGCAACGCGTGAGCAAAGGTGGCAATCCTTTGTTCCAGT 

5 CGCACATGTGGGACGGCTC6GCAGTGCCAATCGATGAGAACCTGGCCATCGCCCAGGAGCTGC 
TCAAGGCGGCGGCGGCCGCCAAGATCATTCTGGAGATCGAGATCGGCGTCGTCGGCGGCGAA 
GAGGACGGCGTGGCGAACGAGATCAACGAGAAGCTGTACACCAGCCCGGAGGACTTCGAGAAA 
ACCATCGAGGCGCTGGGCGCCGGTGAGCACGGCAAATACCTGCTGGCCGCGACGTTCGGCAA 
CGTGCATGGCGTCTACAAGCCCGGCAACGTCAAGCTTCGCCCCGACATCCTTGCGCAAGGGCA 

10 ACAGGTGGCGGCGGCCAAGCTCGGACTGCCGGCCGACGCCAAGCCGTTCGACTTCGTGTTCC 
ACGGCGGCTCGGGTTCGCTTAAGTCGGAGATCGAGGAGGCGCTGCGCTACGGCGTGGTGAAG 
ATGAACGTCGACACCGACACCCAGTACGCGTTCACCCGCGCGATCGCCGGTCACATGTTCACC 
AACTACGACGGAGTGCTCAAGGTCGATGGCGAGGTGGGTGTCAAGAAGGTCTACGACCCGCGC 
AGCTACCTCAAGAAGGCCGAAGCTTCGATGAGCCAGCGGGTCGTTCAGGCGTGCAATGACCTG 

1 5 CACTGCGCCGGAAAGTCCCTAACCCACTAA 

>RvO405 pks6 TB.seq 485729:489934 MW:147615 >emb|AL123456|h4TBH37RV:485729-489937 f 
pks6 SEQIDNO:20 

ATGACAGACGGTTCGGTCACTGCGGATMGCTTCAAAAATGGTTTCGAGAGTACTTGTCCACGC 
20 ATATCGAGTGTCATCCAAATGAGGTCAGCCTAGACGTTCCGATTAGAGATTTAGGTTTGAAATCG 
ATTGATGTCTTAGCGATTCCCGGCGACCTCGGTGACAGATTTGGGT^ 

CGTTTGGGATMTCCTAGCGCTMTGATTTGATTGATAGTCTGTTGAACCAGCGTAGTGCTGACT 
CGTTAAGAGAGAGTCATGGACACGCCGACAGGAACACGCAGGGTCGGGGCAGCATAAACGAGC 
CGGTTGCGGTCATCGGAGTGGGCTGTCGATTTCCGGGAGATATTGACGGCCCGGAACGGCTAT 

25 GGGACTTTCTGACCGAGAAGAAGTGTGCGATAACAGCGTATCCAGATCGTGGGTTCACGAATGC 
TGGAACTTTCGCGGAGTCCGGAGGC I I 1 1 I AAAGGATGTCGCGGGTTTCGATAATAGATTTTTTG 
ATATCCCGCCGGACGAGGCTCTGCGAATGGATCCGCMCAACGGTTGTTACTGGAGGTCTCTTG 
GGAAGCGTTAGAGCATGCAGGAATTATTCGTGAGTCATTAAGACTTTCACGTACGGGCGTATTC 
GTTGGGGTGTCGTCAACTGACTACGTCCGGCTTGTGTCAGCTAGCGCTCAGCAAAAGTCTACTA 

30 TTTGGGATAACACCGGCGGTTGTrCGAGTATTATTGCCAATAGAATCTCATACTTTCTCGATATTC 
AGGGTCCGTCCATTGTCATTGACACGGCATGCTCGTCATCCCTGGTCGCCGTGCATCTAGCCTG 
TGGAAGTCTCAGTACCTGGGACTGCGATATCGCACTTGTCGGTGGGACGAATGTTCTTATTTCAC 
CAGAAGCATGGGGTGGGTTTAGGGAAGCGGGCATCTTGTCGCAGACAGGCTGCTGTCACGCGT 
TCGATAAATCCGCCGACGGGATGGTACGCGGTGAGGGATGCGGAGTTATCGTGCTGCAGCGCC 

35 TCAGTGATGCACGCCTT6AGGGCCGGCGGATATTAGCGATTCTGACGGGTTCAGCGGTCAATC 
AGGACGGTMGTCCMCGGTATTATGGCGCCAAATCCTAGTGCGCAAATTGGTGTTCTTGAAAAT 
GCATGCAAGAGCGCTCGCGTCGATCCGCTGGAAATCGGCTAGGTCGAGGCCCACGGGACCGG 
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AACGTCGTTAGGGGATAGGATCGAGGCGCACGCCTTAGGCATGGTCTTTGGTCGCAAGAGACC 
GGGATCTGGGCCCCTGATGATCGGGAGCATCAAGCCGAATATCGGCCATCTGGAAGGTGCGGC 
TGGCATCGCCGGATTGATCAAGGCGGTGTTGATGGTTGAGCGTGGCTCGCTGCTTCCGAGCGG 
GGGGTTTACGGAGCCAAATCCAGCTATCCCATTCACGGAATTGGGCCTGAGAGTTGTAGACGAA 

5 CTTCAGGAGTGGCCGGTGGTGGGGGGTCGGCCGCGCCGGGCTGGGGTGTCATCGTTCGGCTT 
TGGC6GCACCAATGCGCATGTGATTGTCGAGGAAGCTGGTTCGGTTGGGGCGGACACGGTTTC 
GGGCCGCGCGGATGTTGGCGGTTCCGGTGGTGGGGTGGTGGCGTGGGTGATTTCGGGGAAGA 
CGGCTTCGGCGTTGGCTGCTCAGGCGGGTCGGTTGGGGCGGTATGTGCGGGCTCGGCCGGCG 
CTTGATGTTGTTGATGTGGGGTATTCGTTGGTGAGCACGCGGTCGGTGTTTGATCATCGGGCGG 

10 TGGTGGTCGGCCAGACTCGCGATGAGTTGCTGGCTGGGTTGGCTGGGGTGGTTGCTGGTCGG 
CCGGAGGCTGGGGTGGTCTGCGGTGTTGGCAAGCCGGCGGGCAAGACGGCTTTTGTGTTTGC 
CGGTCAGGGCTGGCAGTGGCTGGGTATGGGTAGCGAGCTTTATGCTGCGTACCCGGTTTTCGC 
CGAGGCCCTCGATGCTGTGGTGGACGAGTTGGACCGGCACCTGCGGTATCCGCTGCGCGATGT 
GATCTGGGGGCACGACCAAGATCTGTTGAATACCACCGAATTCGCCCAGCCGGCGCTGTTTGC 

15 GGTGGAGGTGGCGCTGTATCGGCTGCTCATGTCGTGGGGGGTGCGGCCGGGTTTGGTGCTGG 
GTCATTCG6TGGGCGAGTTGGCCGCGGCGCACGTCGCCGGGGCGCTGTGTTTGCCGGATGCG 
GGGATGCTGGTGGCCGCGCGTGGACGGTTGATGCAGGCGTTGCCCGCCGGCGGCGCCATGTT 
TGCGGTGCAGGCCCGTGAAGACGAGGTAGCGCCGATGCTGGGGCACGATGTGAGCATCGCGG 
CGGTCAATGGTCCGGCTTCGGTGGTGATCTCTGGTGCCCACGATGCGGTGAGCGCGATCGCTG 

20 ATCGGCTGCGCGGCCAGGGCCGTCGGGTCCACCGGTTGGCGGTCTCGCATGCCTTTCACTCG 
GCGTTGATGGAGCCGATGATCGCTGAGTTCACAGCCGTTGCGGCCGAACTGTCTGTGGGCTTG 
CCCACGATCCCGGTCATTTCCAATGTGACCGGGCAGTTGGTGGCCGACGACTTCGCCTCAGCT 
GATTACTGGGCCCGGCATATCCGGGCGGTGGTGCGG1TTGGCGACAGTGTTCGTAGTGCCCAC 
TGCGCCGGTGCCAGTCGTTTCATCGAAGTCGGGCCCGGTGGCGGCTTGACGTCGTTGATCGAG 

25 GCATCGCTGGCCGACGCGCAGATCGTGTCGGTGCCCACGCTGCGCAAAGATCGGCCCGAACC 
GGTCAGTGTGATGACGGCGGCGGCCCAGGGCTTCGTCTCGGGGATGGGCCTGGATTGGGCCT 
CGGTGTTTTCCGGGTACGGGCCCAAGCGGGTGGAGTTGCCGACGTATGCCTTCCAGCATCAAA 
AGTTCTGGCTCGCACCAGCCCCATCGGTCAGCGACCCCACCGCCGCCGGCCAGATCGGGGCT 
AGCGATGGTGGTGCTGAACTCTTGGCGTCCTCCGGGTTTGCCGCCCGGCTGGCCGGTCGGTCG 

30 GCCGACGAGCAACTCGCCGCAGCGATCGAGGTGGTATGTGAGCATGCCGCAGCGGTGCTGGG 
GCGCGACGGCGCTGCCGGACTCGACGCTGGCCAGGCGTTTGCCGATTCGGGATTTAATTCCTT 
GAGTGCCGTGGAGCTACGTAACCGCTTAACAGCCGTCACCGCAGTAACGCTGCCGGCCACCGC 
GATCTTCGATCACCCCACCCCGACCGAACTAGCCCAGTATCTGATCAGCCAAATAGACGGTCAC 
GGCAGCTCCGCCGCCGCAGCGGCAAACCCGGCGGAGCGAATCGATGCGCTCACCGATCTTTTT 

35 CTACAAGCTTGCGATGCGGGTCGGGATGCCGATGGTTGGAAGATGGTCGCCCTGGCGTCGAAT 
ACGCGCGAGCGCATGAGCTCACCGGTTCGGAACAACGTATCGAAGAACGTCGCACTGCTGGCA 
GATGGTATCTCCGATGTGGTTGTAATTTGTATCCCAACTCTAACTGTGCTATCGGATCAGCGTGA 
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ATATCGAGATATTGCGMTGCGATQACAGGCCGCC^TTCGGTTTATTCGCTTACGCTTCCCGGG 
TTCGATTCGTCTGATGCACTGCCGCAAAACGCGGATATGATTGTTGAAACCGTATCTAACGCAAT 
TATTGATGTGGTAGGCGGCAGCTGCCGTTTTGTGCTGTCGGGCTATTCATCGGGTGGGGTGTTG 
GCCTATGCCCTCTGCTCCCATCTGTCGGTCAAGCACCAGCGGAATCCCCTCGGAGTCGCACTCA 

5 TCGATACATATCTGCCTAGTCAGATCGCCAATCCTTCAATGAATGAAGGGTTCAGCCCCAACGAT 
ACTGGGAAGGGCCTTTCCCGTGAAGTAATTCGAGTGGCCAGAATGTTGAATCGGTTAACTGCCA 
CCCGACTCACCGCGGCAGCCACCTATGCTGCAATCTTTCAGGCCTGGGAACCAGGTAGATCAAT 
GGCTCCGGTTCTTAACATCGTGGCGAAGGACCGAATAGCTACCGTCGAAAATTTACGCGAAGAA 
CGAATCAACCGGTGGCGAACTGCTGCTGCAGAGGCGGCCTATTCTGTAGCCGAAGTACCCGGG 

10 GATCATTTCGGAATGATGAGCACCTCGAGTGAGGCAATAGCTACCGAAATACATGATTGGATTTC 
TGGGCTCGTTCGAGGGCCTCATCGGTAG 

>Rv0435c- ATPase Of AAA-family TB.seq 522348:524531 MW:75315 
>emb|AL123456|MTBH37RV:c524531-522345, Rv0435c SEQ ID NO:21 

15 GTGACCCACCCGGACCCGGCCCGCCAACTCACCCTTACCGCCCGGCTGAACACCTCGGCCGTC 
GACTCACGCCGCGGCGTCGTTCGGTTGCACCCCAATGCCATTGCTGCCCTTGGCATCCGCGAG 
TGGGACGCGGTGTCGCTGACCGGCTCTCGGACAACCGCCGCGGTCGCCGGCCTGGCCGCGGC 
AGACACCGCGGTCGGGACGGTGCTGCTCGATGACGTCACACTGTCCAATGCGGGCCTTCGCGA 
AGGCACCGAGGTGATCGTCAGCCCGGTCACCX3TCTACGGAGCGCGATCGGTGACGCTGAGCG 

20 GTTCAACGCTGGCCACCCAGTCGGTGCCGCCGGTCACGCTGCGGCAGGCCCTACTCGGCAAG 
GTGATGACCGTCGGTGACGCGGTCTCGCTGCTGCCCCGCGATCTAGGCCCCGGCACATCCACG 
TCGGCTGCCAGCCGCGCATTGGCAGCTGCGGTCGGGATCAGTTGGACCTCGGAGCTGCTGACC 
GTTACCGGCGTCGACCCCGACGGGCCGGTCAGCGTGCAGCCCAACTCGCTGGTCACCTGGGG 
CGCTGGGGTCCCGGCCGCAATGGGTACGTCCACGGCGGGGCAAGTGAGCATCTCGAGTCCGG 

25 AGATCCAGATCGAAGAGCTCAAGGGCGCCCAGCCGCAGGCTGCCAAGCTCACCGAATGGCTCA 
AGCTTGCCCTCGATGAGCCGCACCTACTACAGACCTTGGGCGCCGGCACCAATTTGGGTGTGC 
TGGTGTCGGGTCCGGCCGGGGTGGGCAAGGCGACGCTGGTGCGCGCGGTGTGCGACGGCCG 
AAGGTTGGTGACACTGGATGGTCCGGAGATTGGAGCTCTGGCCGCCGGAGACCGGGTCAAAGC 
CGTGGCCTCGGCAGTGCAGGCGGTTCGCCATGAGGGCGGTGTGTTGCTGATCACCGATGCCGA 

30 CGCCCTGCTGCCAGCCGCCGCCGAGCCGGTAGCCTCGCTGATCCTGTCCGAGCTGCGTACCG 
CGGTGGCCACCGCCGGTGTGGTATTGATCGCCACCTCAGCACGGCCCGATCAACTCGATGCCC 
GGCTGCGTTCCCCCGAGTTGTGCGACCGGGAGCTTGGCCTGCCGCTGCCCGACGCGGCCACC 
CGCAAATCGCTGCTGGAGGCGCTGCTGAATCCGGTTCCTACCGGAGACCTCAACCTCGACGAA 
ATCGCCTCCCGCACACCGGGTTTCGTCGTGGCCGACCTGGCTGCGCTGGTTCGCGAGGCGGC 

35 GCTGCGGGCAGCGTCTCGAGCCAGTGCCGACGGCCGACCACCGATGCTGCACCAAGACGACC 
TCCTCGGTGCGTTGACCGTCATCCX3GCCGCTGTCCCGCTCGGCCAGCGACGAAGTCACCGTGG 
GTGACGTGACGCTC6ACGATGTCGGTGACATGGCCGCGGCCAAACAAGCACTGACCGAGGCG 
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GTGCTGTGGCCGCTGCAGCACCCCGACACCTTC6CTCGGCTAGGTGTCGAACCGCCGCGCGG 

GGTGTTGCTGTACGGCCCGCCCGGCTGCGGCAAGACCTTTGTGGTTCGTGCCCTGGCCAGCAC 

CGGACAGTTGAGCGTGCATGCCGTCAAAGGGTCGGAGCTGATGGACAAGTGGGTGGGCTCCTC 

GGAGAAGGCAGTCCGCGAGCTATTCCGGCGGGCCCGCGACTCCGCGCCGTCACTGGTGTTCC 

TCGACGAGCTGGACGCTCTGGCGCCACX3GCGCGGTCAGAGCTTCGACTCGGGCGTCTCCGAC 

CGGGTGGTGGCCGCGCTGCTGACTGAGCTCGACGGTATTGACCCGCTGCGGGATGTCGTCATG 

CTAGGCGCGACCAACCGGCCCGATCTGATAGACCCGGCGCTGCTGCGCCCGGGGCGGCTAGA 

ACGGCTGGTGTTCGTTGAACCGCCCGACGCTGCCGCTCGCCGCGAAATCCTGCGCACCGGTGG 

CAAGTCGATCCCGCTGAGCTCCGACGTCGACCTGGACGAGGTGGCAGCCGGACTCGACGGTTA 

TAGTGCCGCCGACTGTGTGGCGCTGCTGCGCGAAGCCGCGCTTACCGCGATGCGGCGTTCCAT 

CGATGCCGCCAACGTCACCGCCGCCGACCTGGCGACCGCGCGAGAAACCGTGCGCGCGTCGC 

TGGATCCGCTGCAGGTGGCGTCGCTGCGTAAGTTCGGCACCAAGGGTGACCTTCGGTCCTAG 

>Rv0436c pssA CDP-diacylglycerol-serine o-phosphatidy Itransferase TB.seq 524531 :525388 

MW:31219 >emb|ALl23456|MTBH37RV:c525388-524528. pssA SEQ ID NO:22 

ATGATCGGAAAGCCCCGCGGCAGGCGAGGGGTAAACCTGCAGATACTGCCCAGCGCGATGAC 

GGTGCTGTCCATTTGCGCGGGACTGACCGCAATCAAGTTTGCGCTCGAGCACCAGCCGAAGGC 

CGCGATGGCACTGATCGCCGCAGCGGCCATCCTCGACGGGCTCGACGGCCGGGTGGGCCGCA 

TCCTGGATGCCCAGTCGCGGATGGGCGCAGAGATCGACTCACTGGCCGACGCGGTGAACTTCG 

GAGTGACACGCGCGCTGGTGCTTTACGTGTCGATGTTGTCGAAGTGGCCGGTCGGTTGGGTGG 

TCGTGCTGCTCTACGCGGTGTGCGTGGTATTACGGCTGGCGCGGTACAACGCACTGCAGGACG 

ACGGAACCCAGCCCGCCTACGCGCATGAATTCTTCGTCGGMTGCCCGCGCCGGCGGGCGCG 

GTTTCCATGATCGGCCTGCTAGCCCTCAAAATGCAGTTCGGCGAAGGATGGTGGACCTCGGGCT 

GGTTCCTCAGCTTTTGGGTGACGGGAACGTCGATACTCTTGGTCAGCGGGATCCCGATGAAAAA 

GATGCACGCCGTGTCGGTACCACCCAACTACGCGGCCGCCCTGCTGGCGGTGCTGGCTATCTG 

CGCGGCGGCCGCAGTCCTGGCCCCCTACTTGTTGATCTGGGTGATCATCATCGCCTACATGTGC 

CATATTCCTTTCGCGGTGCGCAGCCAGCGCTGGCTTGCCCAACACCCTGAGGTGTGGGACGAC 

AAGCCCAAGCAACGGCGCGCGGTGCGGCGCGCGAGCCGCCGGGCGCATCCCTACCGGCCGT 

CGATGGCGCGGCTGGGCCTGCGCAAGCCGGGTCGACGGCTGTGA 

>Rv0440 groEL 260 kD chaperonin 2 TB.seq 528606:530225 MW:56728 
>emb|AL123456|MTBH37RV:528606-530228, groEL2 SEQ ID NO:23 

ATGGCCAAGACAATTGCGTACGACGAAGAGGCCCGTCGCGGCCTCGAGCGGGGCTTGAACGC 

CCTCGCCGATGCGGTAAAGGTGACATTGGGCCCCAAGGGCCGCAACGTCGTCCTGGAAAAGAA 

GTGGGGTGCCCCCACGATCACCAACGATGGTGTGTCCATCGCCAAGGAGATCGAGCTGGAGGA 

TCCGTACGAGAAGATCGGCGCCGAGCTGGTCAAAGAGGTAGCCAAGAAGACCGATGACGTCGC 

CGGTGACGGCACCACGACGGCCACCGTGCTGGCCCAGGC6TTGGTTCGCGAGGGCCTGCGCA 



63 



WO 01/35317 



PCT/US00/31152 



ACGTCGCGGCCGGCGCCAACCCGCTCGGTCTCAAACGCGGCATCGAAAAGGCCGTGGAGAAG 
GTCACCGAGACCCTGCTCAAGGGCGCCAAGGAGGTCGAGACCAAGGAGCAGATTGCGGCCAC 
CGCAGCGATTTCGGCGGGTGACCAGTCCATCGGTGACCTGATCGCCGAGGCGATGGACAAGGT 
GGGCAACGAGGGCGTCATCACCGTCGAGGAGTCCAACACCTTTGGGCTGCAGCTCGAGCTCAC 

5 CGAGGGTATGCGGTTCGACAAGGGCTACATCTCGGGGTACTTCGTGACCGACCCGGAGCGTCA 
GGAGGCGGTCCTGGAGGACCCCTACATCCTGCTGGTCAGCTCCAAGGTGTCCACTGTCAAGGA 
TCTGCTGCCGCTGCTCGAGAAGGTCATCGGAGCCGGTAAGCCGCTGCTGATCATCGCCGAGGA 
CGTCGAGGGCGAGGCGCTGTCCACCCTGGTCGTCAACAAGATCCGCGGCACCTTCAAGTCGGT 
GGCGGTCAAGGCTCCCGGCTTCGGCGACCGCCGCAAGGCGATGCTGCAGGATATGGCCATTCT 

10 CACCGGTGGTCAGGTGATCAGCGAAGAGGTCGGCCTGACGCTGGAGAACGCCGACCTGTCGC 
TGCTAGGCAAGGCCCGCAAGGTCGTGGTCACCAAGGACGAGACCACCATCGTCGAGGGCGCC 
GGTGACACCGACGCCATCGCCGGACGAGTGGCCCAGATCCGCCAGGAGATCGAGAACAGCGA 
CTCCGACTACGACCGTGAGAAGCTGCAGGAGCGGCTGGCCAAGCTGGCCGGTGGTGTGGCGG 
TGATCAAGGCCGGTGCCGCCACCGAGGTCGAACTCAAGGAGCGCAAGCACCGCATCGAGGAT 

15 GCGGTTCGCAATGCCAAGGCCGCCGTCGAGGAGGGCATCGTCGCCGGTGGGGGTGTGACGCT 
GTTGCAAGCGGCCCGGACCCTGGACGAGCTGAAGCTCGAAGGCGACGAGGCGACCGGCGCCA 
ACATCGTGAAGGTGGCGCTGGAGGCCCCGCTGAAGCAGATCGCCTTCAACTCCGGGCTGGAGC 
CGGGCGTGGTGGCCGAGAAGGTGCGCAACCTGCCGGCTGGCCACGGACTGAACGCTCAGACC 
GGTGTCTACGAGGATCTGCTCGCTGCCGGCGTTGCTGACCCGGTCAAGGTGACCCGTTCGGCG 

20 CTGCAGAATGCGGCGTCCATCGCGGGGCTGTTCCTGACCACCGAGGCGGTCGTTGCCGACAAG 
CCGGAAAAGGAGAAGGCTTCCGTTCCCGGTGGCGGCGACATGGGTGGCATGGATTTCTGA 

>Rv0482 murB TB.seq 570537:571 643 MW:38522 
>emb|AL123456|MTBH37RV:570537-571646,murB SEQIDNO:24 

25 ATGAAACGGAGCGGTGTCGGTTCGCTCTTTGCCGGTGCGCATATTGCCGAGGCGGTCCCGTTG 
GCGCCGCTGACCACTTTGCGTGTGGGCCCGATCGCCCGACGTGTCATCACTTGCACCAGCGCC 
GAACAGGTGGTGGCTGCGCTGCGGCACCTGGATTCGGCGGCCAAGACCGGAGCTGACCGCGC 
GGTGGTGTTTGCTGGTGGCTCCMTTTGGTGATCGCCGAGAACCTGACCGACCTGACCGTGGT 
GCGGTTGGCCAATAGCGGCATCACCATCGACGGTAACTTGGTGCGGGCCGAGGCCGGTGCGG 

30 TCTTCGATGACGTGGTGGTTAGGGCCATCGAACAGGGTCTGGGCGGACTGGAATGCCTGTCTG 
GCATCCCAGGATGGGCCGGGGCGACACCCGTGCAGAACGTGGGGGCGTATGGCGCGGAGGT 
GTCTGACACCATCACTCGGGTTCGGCTTTTGGATCGGTGCACGGGTGAGGTGCGTTGGGTATC 
CGCGCGCGACCTGCGCTTCGGCTATCGCACGAGCGTGCTCAAACACGCTGATGGGGTTGCGGT 
GGCCACCGTGGTCTTGGAGGTGGAGTTTGCGCTGGATCCGTCGGGCCGCAGCGCACCGGTGC 

35 GCTACGGCGAGCTGATCGCCGCGCTGAATGCGACCAGCGGCGAGCGCGCCGACCCGCAAGCG 
GTCCGCGAAGCGGTGCTGGCCCTGCGGGCACGCAAGGGCATGGTGCTGGACCCGACCGACCA 
TGACACGTGGAGCGTGGGATCGTTCTTCACAAACCCGGTGGTGACCCAGGATGTTTACGAACGG 
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CTGGCGGGTGACGCGGCCACCAGAAAGGACGGTCCGOTCCCGCACTATCCCGCGCCCGACGG 
CGTCAAGCTGGCCGCCGGCTGGCTGGTGGMCGGGCCGGCTTCGGCAAGGGCTATCCGGATG 
CCGGCGCCGCCCCATGCCGGCTTTCCACCAAACATGCGCTGGCGCTGACAAATCGTGGCGGG 
GCCACCGCCGAAGATGTGGTGACGCTGGCGCGCGCCGTGCGCGATGGGGTCCATGATGTGTTT 
5 GGTATCACACTAAAACCCGAACCCGTGCTGATCGGCTGCATGTTGTAG 

>Rv0483 -TB.seq 571708:573060 MW:47859 

>emb|AL123456|MTBH37RV:571708-573063. Rv0483 SEQ ID NO:25 

GTGGTCATTCGTGTGCTGTTrCGCCCGGTATCTTTGATACCCGTGAATAACTCCAGCACCCCCCA 

10 GAGTCAGGGGCCGATCAGTCGGCGTCTGGCGTTGACGGCCCTTGGGTTTGGGGTGTTGGCACC 
GAACGTTCTGGTCGCGTGCGCCGGCAAAGTGACCAAGCTGGCCGAGAAGAGGCCGCCACCGG 
CGCCTCGTCTGACTTTCCGGCCTGCCGACTCTGCCGCCGACGTGGTGCCGATCGCGCCGATCA 
GCGTCGAGGTCGGTGACGGCTGGTTTCAGCGGGTCGCGCTGACCAATTCGGCAGGCAAGGTC 
GTCGCCGGGGCATACAGCCGGGATCGCACCATCTACACX3ATCACCGAGCCGCTGGGCTACGAC 

15 ACGACCTACACCTGGAGCGGTTCGGCCGTCGGCCATGACGGCAAGGCGGTTCCGGTGGCGGG 
CAAGTTGACCACCGTGGCACCCGTCAAGACGATCAACGCGGGATTCCAGCTCGCCGACGGGCA 
GACCGTCGGGATCGCGGCGCCGGTGATTATTCAGTTCGATTCACCGATCAGCGACAAGGCCGC 
CGTCGAGCGGGCACTAACCGTGACCACCGACCCGCCTGTCGAGGGGGGCTGGGCCTGGCTGC 
CCGACGAGGCGCAGGGCGCTCGCGTGCACTGGCGTCCTCGGGAGTACTACCCGGCGGGTACC 

20 ACCGTCGACGTCGACGCCAAGCTGTATGGGCTGCCGTTCGGCGACGGCGCGTACGGCGCGCA 
GGATATGTCGTTGCACTTCCAGATCGGTCGTCGTCAGGTGGTCAAGGCCGAAGTCTCGTCGCAC 
CGCATCCAAGTCGTCACCGATGCCGGCGTCATCATGGACTTCCCGTGCAGCTACGGCGAGGCC 
GACTTGGCGCGCAACGTCACCCGCAACGGCATCCACGTCGTCACCGAGAAATACTCGGACTTC 
TACATGTCCMCCCGGCCGGCGGTTACAGCCATATCCACGAACGTTGGGCGGTGCGGATTTCC 

25 AACAACGGCGAGTTCATCCATGCCAACCCTATGAGCGCCGGTGCCCAGGGCAACAGCAATGTC 
ACCAACGGCTGTATCAACCTGTCGACGGAGAACGCCGAACAGTACTACCGCAGCGCGGTCTAC 
GGTGACCCGGTTGAGGTGACCGGCAGTTCGATCCAGCTGTCCTACGCCGACGGTGACATCTGG 
GACTGGGCGGTGGACTGGGACACCTGGGTGTCGATGTCGGCGCTACCGCCACCGGCGGCCAA 
ACCGGCGGCGACGCAAATCCGGGTCACCGCCCCGGTCACGCCGTCGGATGCCCCCACCCCGT 

30 CCGGCACACCCACGACTACTAACGGACCGGGTGGGTAG 

>Rv0489 gpm phosphoglycerate mutase I TB.seq 578424:579170 MW:27217 
>emb|AL123458|MTBH37RV:578424-579173 f gpm SEQ ID NO:26 

ATGGCAAACACTGGCAGCCTGGTGTTGCTGCGCCACGGCGAGAGCGACTGGAATGCCCTCAAC 
35 CTGTTCACCGGCTGGGTCGATGTCGGCCTGACGGACAAGGGCCAGGCAGAGGCGGTTCGAAG 
CGGCGAGCTGATCGCGGAACACGACCTATTGCCCGACGTGCTCTACACCTCGTTGCTGCGGCG 
CGCGATCACCACCGCGCATCTGGCGTTGGACAGCGCCGATCGGCTCTGGATTCCCGTGCGGCG 
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tagctggcggctcaacgaacgcx:actacggcgcgctgcagggtttggacaaggccgagaccaa 
ggcccgctatggcgaagagcagttcatggcctggcggcgcagctatgacacgccgccgccgc 
cgatcgagcggggcagtcagttcagccaggacgccgaccctcgttacgccgacatcggcggt 
ggcccgctcaccgmtgtctggctgacgtggtcgcccggtttttgccatatttcaccgacgtca 
5 tcgttggcgacttgcgggtcggcaagacggtgctgatcgttgcgcacggcaactcgttgcgcg 
cgctggtcaagcacctggaccagatgtctgacgacgaaatcgtcggactgmcatcccgaccg 
gaattccgctgcgctacgacctggattccgcgatgaggccgctggtgcgcggtggtacgtatc 
tggacccggaggcggcagccgccggcgccgccgcggtggccggccagggccgcgggtaa 

10 >Rv0490 senX 3sensor histidine kinase TB.seq 579347:580576 MW:44794 
>emb|AL123456|MTBH37RV:579347-580579, senX3 SEQ ID NO:27 

GTGACTGTGTTCTCGGCGCTGTTGCTGGCCGGGGTTTTGTCCGCGCTGGCACTGGCCGTCGGT 
GGTGCTGTTGGAATGCGGCTGACGTCGCGGGTCGTCGAACAGCGCCAACGGGTGGCCACGGA 
GTGGTCGGGAATCACGGTTTCGCAGATGTTGCAATGCATTGTCACGCTGATGCCGCTGGGCGC 

1 5 CGCGGTGGTGGACACCCATCGCGACGTTGTCTACCTCAACGAACGGGCCAAAGAGCTAGGTCT 
GGTGCGCGACCGCCAGCTCGATGATCAGGCCTGGCGGGCCGCCCGGCAGGCGCTGGGTGGT 
GAAGACGTCGAGTTCGACCTGTCGCCGCGCAAGCGGTCGGCCACGGGTCGATCCGGGCTATC 
AGTGCATGGGCATGCCCGGTTGCTGAGCX3AGGAAGACCGCCGGTTCGCCGTGGTGTTCGTGCA 
CGACCAGTCGGATTATGCGCGGATGGAGGCGGCTAGGCGTGACTTCGTGGCCAACGTCAGTCA 

20 CGAGCTCAAGACGCCCGTCGGTGCCATGGCTCTACTCGCCGAGGCGCTGCTGGCGTCGGCCG 
ACGACTCCGAAACCGTTCGGCGGTTCGCCGAGAAGGTGCTCATTGAGGCCAACCGGCTCGGTG 
ACATG6TCGCCGAGTTGATCGAGCTATCCCGGCTACAGGGCGCCGAGCGGCTACCCAATATGA 
CCGACGTCGACGTCGATACGATTGTGTCGGAAGCGATTTCACGCCATAAGGTGGCGGCCGACA 
ACGCCGACATCGAAGTCCGCACCGACGCGCCCAGCAATCTGCX3GGTGCTGGGCGACCAAACTC 

25 TGCTGGTTACCGCACTGGCAAACCTGGTTTCCAATGCGATTGCCTATTCGCCGCGCX3GGTCGCT 
GGTGTCGATCAGCCGTCGCCGTCGCGGTGCCAACATCGAGATCGCCGTCACCGACXJGGGGCA 
TCGGCATCGCGCCGGAAGACCAGGAGCGGGTCTTCGAACGGTTCTTCCGGGGGGAGAAGGCG 
CGCTGGCGTGCCACCGGAGGCAGCGGACTCGGGTTGGCCATCGTCAAACACGTCGCGGCTAAT 
CACGACGGCACCATCGGCGTGTGGAGCAAACCGGGAACCGGGTCAACGTTCACCTTGGCTCTT 

30 CCGGCGTTGATCGAGGCCTATCACGACGACGAGCGACCCGAGCAGGCGCGAGAGCCCGAACT 
GCGGTCAAACAGGTCACAACGAGAGGAAGAGCTGAGCCGATGA 

>Rv0500 proC pyrroline-5-carboxylate reductase TB.seq 590081:590965 MW;30172 
>emb|AL123456|MTBH37RV:590081-590968, proC SEQ ID NO:28 
35 ATGCTTTTCGGCATGGCAAGGATCGCGATTATCGGCGGCGGCAGCATCGGTGAGGCATTGCTG 
TCGGGTCTGCTGCGGGCGGGCCGGCAGGTCAAAGACCTGGTAGTGGCCGAGCGGATGCCCGA 
TCGCGCCAACTACCTGGCGCAGACCTATTCGGTGTTGGTGACGTCGGCGGCCGACGCGGTGGA 
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GAACGCGACGTTCGTCGTCGTCGCGGTCAAACCAGCCGACGTCGAGCCGGTGATCGCGGATCT 
GGCGAACGCGACTGCGGCGGCCGAAAACGACAGTGCTGAGCAGGTGTTCGTCACCGTGGTAG 
CGGGCATCACGATCGCGTATTTCGAATCCAAGCTACCGGCTGGGACGCCAGTGGTGCGTGCGA 
TGCCGAACGCGGCGGCATTGGTGGGAGCGGGGGTTACAGCGCTGGCCAAAGGCCGCTTTGTC 
5 ACCCCGCAACAGCTTGAGGAGGTGTCGGCCTTGTTCGACGCGGTCGGCGGCGTGCTGACCGTT 
CCGGMTCGCAGTTGGACGCGGTGACCGCGGTGTCCGGCTCGGGTCCGGCCTATTTCTTTCTG 
CTGGTCGAGGCCCTGGTGGATGCCGGAGTCGGGGTGGGCTTGAGCCGTCAGGTGGCCACCGA 
TCTCGCCGCGCAGACAATGGCTGGCTCAGCGGCGATGCTGCTGGAGCGGATGGAGCAAGACC 
AGGGTGGCGCCAATGGCGAGCTGATGGGGCTGCGCGTGGACCTTACCGCATCACGGCTGCGC 
10 GCCGCGGTTACCTCGCCGGGCGGTACGACCGCCGCTGCGCTGCGGGAACTCGAACGCGGCG 
GGTTTCGGATGGCTGTCGACGCGGCGGTTCAAGCCGCCAAAAGCCGCTCTGAGCAGCTCAGAA 
TTACACCGGAATGA 

>Rv0528 • TB.seq 618303:619889 MW:57132 
>emb|AL123456|MTBH37RV:618303-619892, Rv0528 SEQ ID NO:29 

1 5 ATGTGGCGGTCGTTGACGTCGATGGGCACCGCGCTGGTGCTGCTGTTTTTGCTCGCGCTGGCT 
GCCATACCCGGGGCCCTGCTGCCGCAGCGTGGCCTCAACGCCGCCAAGGTGGACGACTACCT 
GGCCGCGCACCCACTCATCGGTCCGTGGCTGGACGAGCTGCAGGCCTTCGACGTGTTCTCCAG 
CTTCTGGTTCACCGCCATCTACGTGCTGCTGTTCGTGTCCCTCGTCGGCTGTCTGGCCCCGCGG 
ACGATCGAGCACGCCCGCAGCCTGCGGGCTACACCGGTCGCCGCCCCGCGCAACCTGGCCCG 

20 GCTGCCCAAGCACGCCCACGCCCGGCTGGCCGGCGAGCCCGCCGCCCTGGCCGCCACCATCA 
CGGGCCGGCTGCGCGGCTGGCGCAGCATCACCCGGCAACAAGGCGACAGCGTGGAAGTCTCC 
GCCGAGAAGGGCTACCTGCGCGAGTTCGGCAACCTGGTGTTCCACTTCGCGCTGCTGGGTCTG 
CTGGTGGCGGTGGCCGTCGGCAAGCTGTTCGGCTACGAGGGCAACGTGATCGTGATAGCCGA 
CGGCGGACCCGG 1 1 I I IGTTCGGCGTCGCCGGCCGCGTTCGACTCGTTTCGCGCCGGCAACAC 

25 CGTCGACGGCACGTCGTTGCACCCGATCTGTGTGCGGGTCAACAACTTCCAAGCGCACTACCT 
GCCGTCCGGGCAGGCCACCTCGTTCGCCGCCGACATCGACTATCAGGCCGACCCGGCCACTG 
CTGACCTGATCGCCAACAGCTGGCGGCCCTACCGGCTGCAGGTCAATCACCCGCTGCGGGTCG 
GCGGCGACCGGGTGTACCTGCAGGGCCACGGCTATGCGCCCACCTTCACCGTGACGTTCCCG 
GACGGGCAGACCCGCACGTCGACCGTGCAGTGGCGACCCGACAACCCGCAGACCCTGCTGTC 

30 GGCGGGCGTCGTGCGCATCGACCCGCCGGCCGGCAGCTACCCCAACCCCGACGAGCGTCGCA 
AACACCAGATCGCCATGCAGGGCCTGCTGGCTCCCACCGAGCAGCTCGACGGCACCCTGCTGT 
CGTCGCGTTTCCCCGCGCTCAATGCCCCGGCGGTGGCCATCGACATCTACCGCGGCGACACCG 
GCCTGGACAGCGGGCGGCCCCAGTCGTTGTTCACCCTGGACCACCGGCTGATCGAGCAGGGC 
CGGCTGGTCAAGGAAAAGCGGGTCAACCTGCGCGCCGGTCAGCAAGTCCGCATCGACCAAGG 

35 CCCGGCGGCCGGCACGGTGGTCCGGTTCGACGGCGCGGTGCCGTTCGTCAACCTGCAGGTCT 
CCCACGACCCCGGCCAGTCCTGGGTGCTGGTCTTCGCAATCACGATGATGGCGGGACTGCTGG 
TGTCGCTGCTGGTGCGCAGGCGCCGGGTGTGGGCGCGGATCACGCCGACGACCGCGGGTACG 
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GTAAACGTCGAGCTGGGCGGCCTGACGCGCACCGACAACTCCGGGTG6GGCGCCGAGTTCGA 
GCGGCTGACCGGGCGGTTGCTGGCGGGTTTTGAGGCGCGGTCCCCGGACATGGCCGAAGCGG 
CCGCAGGGACCGGAAGGGACGTCGATTGA 

>Rv0667 rpoB [beta] subunit of RNA polymerase TB.seq 759805:763320 MW:129220 
>emb|AL123456|MTBH37RV:759805-763323, rpoB SEQ ID NO:30 

TTGGCAGATTCCCGCCAGAGCAAAACAGCCGCTAGTCCTAGTCCGAGTCGCCCGCAAAGTTCCT 

CGAATAACTCCGTACCCGGAGCGCCAAACCGGGTCTCCTTCGCTAAGCTGCGCGAACCACTTG 

AGGTTCCGGGACTCCTTGACGTCCAGACCGATTCGTTCGAGTGGCTGATCGGTTCGGCGCGCT 

GGCGCGAATCCGCCGCCGAGCGGGGTGATGTCAACCCAGTGGGTGGCCTGGAAGAGGTGCTC 

TACGAGCTGTCTCCGATCGAGGACTTCTCCGGGTCGATGTCGTTGTCGTTCTCTGACCCTCGTT 

TCGACGATGTCAAGGCACCCGTCGACGAGTGCAAAGACAAGGACATGACGTACGCGGCTGCAC 

TGTTCGTCACCGCCGAGTTCATCAACAACAACACCGGTGAGATCAAGAGTCAGACGGTGTTCAT 

GGGTGACTTCCCGATGATGACCGAGAAGGGCACGTTCATCATCAACGGGACCGAGCGTGTGGT 

GGTCAGCCAGCTGGTGCGGTCGCCCGGGGTGTACTTCGACGAGACCATTGACAAGTCCACCGA 

CAAGACGCTGCACAGCGTCAAGGTGATCCCGAGCCGCGGCGCGTGGCTCGAGTTTGACGTCGA 

CAAGCGCGACACCGTCGGCGTGCGCATCGACCGCAAACGCCGGCAACCGGTCACCGTGCTGC 

TCAAGGCGCTGGGCTGGACCAGCGAGCAGATTGTCGAGCGGTTCGGGTTCTCCGAGATCATGC 

GATCGACGCTGGAGAAGGACAACACCGTCGGCACCGACGAGGCGCTGTTGGACATCTACCGCA 

AGCTGCGTCCGGGCGAGCCCCCGACCAAAGAGTCAGCGCAGACGCTGTTGGAAAACTTGTTCT 

TCAAGGAGAAGCGCTACGACCTGGCCCGCGTCGGTCGCTATAAGGTCAACAAGAAGCTCGGGC 

TGCATGTCGGCGAGCCCATCACGTCGTCGACGCTGACCGAAGAAGACGTCGTGGCCACCATCG 

AATATCTGGTCCGCTTGCACGAGGGTCAGACCACGATGACCGTTGCGGGCGGCGTCGAGGTGC 

CGGTGGAAACCGACGACATCGACCACTTCGGCAACCGCCGCCTGCGTACGGTCGGCGAGCTG 

ATCCAAAACCAGATCCGGGTCGGCATGTCGCGGATGGAGCGGGTGGTCCGGGAGCGGATGAC 

CACCCAGGACGTGGAGGCGATCACAGCGCAGACGTTGATCAACATCCGGCCGGTGGTCGCCG 

CGATCAAGGAGTTCTTCGGCACCAGCCAGCTGAGCCAATTCATGGACCAGAACAACCCGCTGTC 

GGGGTTGACCCACAAGCGCCGACTGTCGGCGCTGGGGCCCGGCGGTCTGTCACGTGAGCGTG 

CCGGGCTGGAGGTCCGCGACGTGCACCCGTCGCACTACGGCCGGATGTGCCCGATCGAAACC 

CCTGAGGGGCCCAACATCGGTCTGATCGGCTCGCTGTCGGTGTACGCGCGGGTCAACCCGTTC 

GGGTTCATCGAAACGCCGTACCGCAAGGTGGTCGACGGCGTGGTTAGCGACGAGATCGTGTAC 

CTGACCGCCGACGAGGAGGACCGCCACGTGGTGGCACAGGCCAATTCGCCGATCGATGCGGA 

CGGTCGCTTCGTCGAGCCGCGCGTGCTGGTCCGCCGCAAGGCGGGCGAGGTGGAGTACGTGC 

CCTCGTCTGAGGTGGACTACATGGACGTCTCGCCCCGCCAGATGGTGTCGGTGGCCACCGCGA 

TGATTCCCTTCCTGGAGCACGACGACGCCAACCGTGCCCTCATGGGGGCAAACATGCAGCGCC 

AGGCGGTGCCGCTGGTCCGTAGCGAGGCCCCGCTGGTGGGCACCGGGATGGAGCTGCGCGC 

6GCGATCGACGCCGGCGACGTCGTCGTCGCCGAAGAAAGCGGCGTCATCGAGGAGGTGTCGG 
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CCGACTACATCACTGTGATGCACGACAACGGCACCCGGCQTACCTACCQGATGCGCAAGTTTG 
CCCGGTCCAACCACGGCACTTGCGCCAACCAGTGCCCCATCGTGGACGCGGGCGACCGAGTC 
GAGGCCGGTCAGGTGATCGCCGACGGTCCCTGTACTGACGACGGCGAGATGGCGCTGGGCAA 
GAACCTGCTGGTGGCCATCATGCCGTGGGAGGGCCACAACTACGAGGACGCGATCATCCTGTC 

5 CAACCGCCTGGTCGAAGAGGACGTGCTCACCTCGATCCACATCGAGGAGCATGAGATCGATGC 
TCGCGACACCAAGCTGGGTGCGGAGGAGATCACCCGCGACATCCCGAACATCTCCGACGAGGT 
GCTCGCCGACCTGGATGAGCGGGGCATCGTGCGCATCGGTGCCGAGGTTCGCGACGGGGACA 
TCCTGGTCGGCAAGGTCACCCCGAAGGGTGAGACCGAGCTGACGCCGGAGGAGCGGCTGCT6 
CGTGCCATCTTCGGTGAGAAGGCCCGCGAGGTGCGCGACACTTCGCTGAAGGTGCCGCACGG 

10 CGAATCCGGCAAGGTGATCGGCATTCGGGTGTTTTCCCGCGAGGACGAGGACGAGTTGCCGGC 
CGGTGTCAACGAGCTGGTGCGTGTGTATGTGGCTCAGAAACGCAAGATCTCCGACGGTGACAA 
GCTGGCCGGCCGGCACGGCAACAAGGGCGTGATCGGCAAGATCCTGCCGGTTGAGGACATGC 
CGTTCCTTGCCGACGGCACCCCGGTGGACATTATTTTGAACACCCACGGCGTGCCGCGACGGA 
TGAACATCGGGCAGATTTTGGAGACCCACCTGGGTTGGTGTGCCCACAGCGGCTGGAAGGTCG 

15 ACGCCGCCAAGGGGGTTCCGGACTGGGCCGCCAGGCTGCCCGACGAACTGCTCGAGGCGCAG 
CCGAACGCCATTGTGTCGACGCCGGTGTTCGACGGCGCCCAGGAGGCCGAGCTGCAGGGCCT 
GTTGTCGTGCACGCTGCCCAACCGCGACGGTGACGTGCTGGTCGACGCCGACGGCAAGGCCA 
TGCTCTTCGACGGGCGCAGGGGCGAGCCGTTCCCGTACCCGGTCACGGTTGGCTACATGTACA 
TCATGAAGCTGCACCACCTGGTGGACGACAAGATCCACGCCCGCTCCACCGGGCCGTACTCGA 

20 TGATCACCCAGCAGCCGCTGGGCGGTAAGGCGCAGTTCGGTGGCCAGCGGTTCGGGGAGATG 
GAGTGCTGGGCCATGCAGGCCTACGGTGCTGCCTACACCCTGCAGGAGCTGTTGAGCATCAAG 
TCCGATGACACCGTCGGCCGCGTCAAGGTGTACGAGGCGATCGTCAAGGGTGAGAACATCCCG 
GAGCCGGGCATCCCCGAGTCGTTCAAGGTGGTGCTCAAAGAACTGCAGTCGCTGTGCCTCAAC 
GTCGAGGTGCTATCGAGTGACGGTGCGGCGATCGAACTGCGCGAAGGTGAGGACGAGGACCT 

25 GGAGCGGGCCGCGGCCAACCTGGGAATCAATCTGTCCCGCAACGAATCCGCAAGTGTCGAGGA 
TCTTGCGTAA 

>Rv0668 rpoC [bete]' subunit of RNA polymerase TB.seq 763368:767315 MW:146740 
>emb|AL123456|MTBH37RV:763368-767318, rpoC SEQ ID NO:31 

30 GTGCTCGACGTCAACTTCTTCGATGAACTCCGCATCGGTCTTGCTACCGCGGAGGACATCAGGC 
AATGGTCCTATGGCGAGGTCAAAAAGCCGGAGACGATCAACTACCGCACGCTTAAGCCGGAGA 
AGGACGGCCTGTTCTGCGAGAAGATCTTCGGGCCGACTCGCGACTGGGAATGCTACTGCGGCA 
AGTACAAGCGGGTGCGCTTCAAGGGCATCATCTGCGAGCGCTGCGGCGTCGAGGTGACCCGC 
GCCAAGGTGCGTCGTGAGCGGATGGGCCACATCGAGCTTGCCGCGCCCGTCACCCACATCTG 

35 GTACTTCAAGGGTGTGCCCTCGCGGCTGGGGTATCTGCTGGACCTGGCCCCGAAGGACCTGGA 
GAAGATCATCTACTTCGCTGCCTACGTGATCACCTCGGTCGACGAGGAGATGCGCCACAATGAG 
CTCTCCACGCTCGAGGCCGAAATGGCGGT6GAGCGCAAGGCCGTCGAAGACCAGCGCGAC6G 
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CGAACTAQAGGCCCGGGCGCAAAAGCTGGAGGCCGACCTGGCCGAGCTGGAGGCCGAGGGC 
GCCAAGGCCGATGCGCGGCGCAAGGTTCGCGACGGCGGCGAGCGCGAGATGCGCCAGATCC 
GTGACCGCGCGCAGCGTGAGCTGGACCGGTTGGAGGACATCTGGAGCACTTTCACCAAGCTGG 
CGCCCAAGCAGCTGATCGTCGACGAAAACCTCTACCGCGAACTCGTCGACCGCTACGGCGAGT 
5 ACTTCACCGGTGCCATGGGCGCGGAGTCGATCCAGAAGCTGATCGAGAACTTCGACATCGACG 
CCGAAGCCGAGTCGCTGCGGGATGTCATCCGAAACGGCAAGGGGCAGAAGAAGCTTCGCGCC 
CTCAAGCGGCTGAAGGTGGTTGCGGCGTTCCAACAGTCGGGCAACTCGCCGATGGGCATGGTG 
CTCGACGCCGTCCCGGTGATCCCGCCGGAGCTGCGCCCGATGGTGCAGCTCGACGGCGGCCG 
GTTCGCCACGTCCGACTTGAACGACCTGTACCGCAGGGTGATCAACCGCAACAACCGGCTGAA 

10 AAGGCTGATCGATCTGGGTGCGCCGGAAATCATCGTCAACAACGAGAAGCGGATGCTGCAGGA 
ATCCGTGGACGCGCTGTTCGACAATGGCCGCCGCGGCCGGCCCGTCACCGGGCCGGGCAACC 
GTCCGCTCAAGTCGCTTTCCGATCTGCTCAAGGGCAAGCAGGGCCGGTTCCGGGAGAACCTGC 
TCGGCAAGCGTGTCGACTACTCGGGCCGGTCGGTCATCGTGGTCGGCCCGCAGCTCAAGCTGC 
ACCAGTGCGGTCTGCCCAAGCTGATGGCGCTGGAGCTGTTCAAGCCGTTCGTGATGAAGCGGC 

15 TGGTGGACCTCAACCATGCGCAGAACATCAAGAGCGCCAAGCGCATGGTGGAGCGCCAGCGCC 
CCCAAGTGTGGGATGTGCTCGAAGAGGTCATCGCCGAGCACCCGGTGTTGCTGAACCGCGCAC 
CCACCCTGCACCGGTTGGGTATCCAGGCCTTCGAGCCAATGCTGGTGGAAGGCAAGGCCATTC 
AGCTGCACCCGTTGGTGTGTGAGGCGTTCAATGCCGACTTCGACGGTGACCAGATGGCCGTGC 
ACCTGCCTTTGAGCGCCGMGCGCAGGCCGAGGCTCGCATTTTGATGTTGTCCTCCAACAACAT 

20 CCTGTCGCCGGCATCTGGGCGTCCGTTGGCCATGCCGCGGCTGGACATGGTGACCGGGCTGT 
ACTACCTGACCACCGAGGTCCCCGGGGACACCGGCGAATACCAGCCGGCCAGCGGGGATCAC 
CCGGAGACTGGTGTCTACTCTTCGCCGGCCGAAGCGATCATGGCGGCCGACCGCGGTGTCTTG 
AGCGTGCGGGCCAAGATCAAGGTGCGGCTGACCCAGCTGCGGCCGCCGGTCGAGATCGAGGC 
CGAGCTATTCGGCCACAGCGGCTGGCAGCCGGGCGATGCGTGGATGGCCGAGACCACGCTGG 

25 GCCGGGTGATGTTCAACGAGCTGCTGCCGCTGGGTTATCCGTTCGTCAACAAGCAGATGCACAA 
GAA6GTGCAGGCCGCCATCATCAACGACCTGGCCGAGCGTTACCCGATGATCGTGGTCGCCCA 
GACCGTCGACAAGCTCAAGGACGCCGGCTTCTACTGGGCCACCCGCAGCGGCGTGACGGTGT 
CGATGGCCGACGTGCTGGTGCCGCCGCGCAAGAAGGAGATCCTCGACCACTACGAGGAGCGC 
GCGGACAAGGTCGAAAAGCAGTTCCAGCGTGGCGCTTTGAACCACGACGAGCGCAACGAGGC 

30 GCTGGTGGAGATTTGGAAGGAAGCCACCGACGAGGTCGGTCAGGCGTTGCGGGAGCACTACC 
CCGACGACAACCCGATCATCACCATCGTCGACTCCGGCGCCACCGGCAACTTCACCCAGACTC 
GAACGCTGGCCGGTATGAAGGGCCTGGTGACCAACCCGAAGGGTGAGTTCATCCCGCGTCCG 
GTCAAGTCGTCCTTCCGTGAGGGCCTGACCGTGCTGGAGTACTrCATCAACACCCACGGCGCTC 
GAAAGGGCTTGGCGGACACCGCGTTGCGCACCGCCGACTCCGGCTACCTGACCCGACGTCTG 

35 GTGGACGTGTCCCAGGACGTGATCGTGCGCGA6CACGACTGCCAGACCGAGCGCGGCATCGT 
CGTCGAGCTGGCCGAGCGTGCACCCGACGGCACGCTGATCCGCGACCCGTACATCGAAACCTC 
GGCCTACGCGCGGACCCTGGGCACCGACGCGGTCGACGAGGCCGGCAACGTGATCGTCGAGC 
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GTGGTCAAGACCTG6GCGATCCGGAGATTGACGCTCTGTTGGCTGCTGGTATTACCCAGGTCAA 
GGTGCGTTCGGTGCTGACGTGTGCCACCAGCACCGGCGTGTGCGCGACCTGCTACGGGCGTT 
CCATGGCCACCGGCAAGCTGGTCGACATCGGTGAAGCCGTCGGCATCGTGGCCGCCCAGTCC 
ATCGGCGAACCCGGCACCCAGCTGACCATGCGCACCTTCCACCAGGGTGGCGTCGGTGAGGA 
5 CATCACCGGTGGTCTGCCCCGGGTGCAGGAGCTGTTCGAGGCCCGGGTACCGCGTGGCAAGG 
CGCCGATCGCCGACGTCACCGGCCGGGTTCGGCTCGAGGACGGCGAGCGGTTCTACAAGATC 
ACCATCGTTCCTGACGACGGCGGTGAGGAAGTGGTCTACGACAAGATCTCCAAGCGGCAGCGG 
CTGCGGGTGTTCAAGCACGAAGACGGTTCCGAACGGGTGCTCTCCGATGGCGACCACGTCGAG 
GTGGGCCAGCAGCTGATGGAAGGCTCGGCCGACCCGCATGAGGTGCTGCGGGTGCAGGGCCC 

10 CCGCGAGGTGCAGATACACCTGGTTCGCGAGGTCCAGGAGGTCTACCGCGCCCAAGGTGTGTC 
GATCCACGACAAGCACATCGAGGTGATCGTTCGCCAGATGCTGCGCCGGGTGACCATCATCGA 
CTCGGGCTCGACGGAGTTTTTGCCTGGCTCGCTGATCGACCGCGCGGAGTTCGAGGCAGAGAA 
CCGCCGAGTGGTGGCCGAGGGCGGTGAGCCCGCGGCCGGCCGTCCGGTGCTGATGGGCATC 
ACGAAGGCGTCGCTGGCCACCGACTCGTGGCTGTCGGCGGCGTCGTTCCAGGAGACCACTCG 

15 CGTGCTGACCGATGCGGCGATCAACTGCCGCAGCGATAAGCTCAACGGTCTGAAGGAAAACGT 
GATCATCGGCAAGCTGATCCCGGCCGGTACCGGTATCAACCGCTACCGCAACATCGCGGTGCA 
GCCCACCGAGGAGGCCCGCGCTGCGGCGTACACCATCCCGTCGTATGAGGATCAGTACTACAG 
CCCGGACTTCGGTGCGGCCACCGGTGCTGCCGTCCCGCTG6ACGACTAGGGCTACAGCGACTA 
CCGCTAG 

20 

>Rv071 1 atsA TB.seq 806333:808693 MW:86216 
>emb|AL123456|MTBH37RV:806333-808696 i atsA SEQ ID NO:32 

ATGGCACCCGAGGCCACCGAGGCGTTCAACGGCACCATCGAGCTGGATATTCGTGATTCGGAG 
CCGGATTGGGGCCCATACGCAGCGCCGGTGGCACCGGAGCACTCACCAAACATCCTGTATCTG 

25 GTCTGGGACGACGTCGGCATCGCGACCTGGGACTGCTTTGGCGGCCTGGTCGAGATGCCCGC 
GATGACGCGCGTCGCCGAGCGTGGCGTGCGACTGTCGCMTTTCACACGACCGCACTGTGCTC 
GCCGACCCGGGCGTCGCTGCTGACCGGTCGCAACGCCACCACCGTAGGCATGGCTACCATCG 
AAGAGTTCACCGACGGGTTCCCCAACTGCAACGGGCGGATCCCGGCTGACACCGCGTTGCTCC 
CAGAGGTGCTGGCCGAACATGGCTACAACACCTACTGTGTGGGCAAGTGGCACCTGACGCCAC 

30 TCGAAGAATCCAATATGGCGTCGACGAAGCGGCACTGGCCGACCTCGCGTGGGTTCGAGCGGT 
TCTACGGATTCCTAGGCGGGGAGACCGACCAGTGGTATCCCGACCTGGTATACGACAACCACC 
CAGTGAGTCCTCCCGGCACACCCGAGGGTGGCTACCACCTGTCAAAAGACATCGCCGACAAGA 
CGATCGAGTTCATTCGTGATGCCAAGGTGATCGCGCCCGACAAGCCGTGGTTGAGCTACGTGTG 
CCCAGGCGCCGGGCATGCGCCGCACCACGTCTTCAAGGAATGGGCGGACAGATACGCCGGCC 

35 GATTCGACATGGGGTATGAGCGCTATCGCGAGATCGTGCTGGAAAGGCAAAAGGCGCTAGGGA 
TCGTGCCACCCGACACCGAACTGTCGCCCATAAACCCTTATCTGGATGTGCCGGGGCCAAACG 
GCGAGACCTGGCCGCTGCAGGACACGGTGCGGCCGTGGGACTCGCTGAGCGATGAAGAAAAG 
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MGCTGTTTTGCCGGATGQCCGAGGTGTTCGCCGGCTTTCTGAGCTACACCGACGCCCAGATC 
GGACGGATCCTGGACTACCTCGAGGAATCCGGCCAGCTGGACAACACCATCATCGTGGTGATC 
TCCGAGAACGGCGCCAGCGGCGAGGGCGGACCCAACGGATCGGTCAACGAAGGCAAGTTCTT 
CAACGGCTACATCGACACCGTCGCTGAAAGCATGAAGCTCTTCGACCACCTCGGTGGCCCGCA 

5 GACCTACAACCACTACCXJCATCGGGTGGGCAATGGCCTTCAACACCCCCTACAAGCTGTTCAAG 
CGCTACGCCTCGCATGAAGGCGGCATTGCCGACCCGGCAATCATCTCCTGGCCCAACGGCATT 
GCCGCACACGGTGAAATCCGCGACAACTACGTCAATGTCAGCGACATCACGCCCACCGTCTAC 
GACCTGTTGGGCATGACACCGCCGGGGACCGTCAAGGGGATTCCGCAGAAACCGATGGACGG 
CGTGAGCTTCATAGCGGCCCTTGCCGACCCGGCCGCCGACACCGGCAAGACCACCCAGTTCTA 

10 CACCATGCTGGGCACCCGCGGGATCTGGCATGAAGGTTGGTTCGCCAACACCATTCACGCGGC 
CACGCCCGCCGGCTGGTCGAATTTCAACGCTGACCGCTGGGAACTGTTCCACATCGCAGCAGA 
CCGCAGCCAGTGCCACGACCTGGCCGCCGAGCATCXJCGACAAACTTGAGGAGCTCAAGGCGCT 
GTGGTTCTCCGAAGCCGCCAAGTACAACGGGCTGCCGCTGGCCGATCTGAACCTCCTGGAAAC 
GATGACTCGGTCGCGGCGTTACCTGGTCAGCGAACGAGCCAGCTACGTCTACTATCCCGACTG 

1 5 CGCTGACGTCGGCATCGGCGCGGCCGTAGAGATTCGCGGGCGCTCGTTCGCCGTGCTGGCCG 
ATGTGACCATCGATACCACCGGCGCCGAGGGCGTGCTGTTCAAGCACGGCGGCGCCCATGGC 
GGGCACGTGCTGTTCGTCCGGGACGGACGCTTGCACTACGTCTACAACTTCCTCGGTGAGCGC 
CAGCAGCTGGTCAGCTCGTCGGGTCCGGTCCCGTCGGGAAGACATCTACTCGGGGTTCGTTAT 
TTGCGGACCGGAACCGTGCCCAACAGTCACACGCCGGTGGGCGATCTTGAGCTGTTCTTCGAC 

20 GAGAACCTGGTCGGCGCCCTGACCAATGTGCTGACCCACCCTGGAACGTTCGGGTTGGCCGGC 
GCCGCTATCAGCGTTGGCCGCAACGGCGGTTCGGCTGTGTCCAGCCACTACGAAGCGCCGTTC 
GCGTTCACCGGCGGTACCATCACCCAGGTCACCGTCGACGTGTCAGGCCGACCGTTCGAAGAT 
GTGGAATCCGATCTTGCGCTTGCTTTTTCGCGTGACTGA 

25 >Rv0764c - lanosterol 14niemethylase cytochrome P450 TB.seq 856683:858035 MW:50879 
>emb|AL123456|MTBH37RV:c858035-856680, Rv0764c SEQ ID NO:33 

ATGAGCGCTGTTGCACTACCCCGGGTTTCGGGTGGCCACGACGAACACGGCCACCTCGAGGAG 
TTCCGCACCGATCCGATCGGGCTGATGCAACGGGTCCGCGACGAATGCX3GAGACGTCGGTACC 
TTCCAGCTGGCCGGGAAGCAGGTCGTGCTGCTGTCCGGCTCGCACGCCAACGAATTCTTCTTC 

30 CGGGCGGGCGACGACGACCTGGACCAGGCCAAGGCATACCCGTTCATGACGCCGATCTTCGG 
CGAGGGCGTGGTGTTCGACGCCAGCCCGGAACGGCGTAAAGAGATGCTGCACAATGCCGCGC 
TACGCGGCGAGCAGATGAAGGGCCACGCTGCCACCATCGAAGATCAAGTCCGACGGATGATCG 
CCGACTGGGGTGAGGCCGGCGAGATCGATCTGCTGGACTTCTTCGCCGAGCTGACCATCTACA 
CCTCCTCGGCCTGCCTGATCGGCAAGAAGTTCCGCGACCAGCTCGACGGGCGATTCGCCAAGC 

35 TCTATCACGAGTTGGAGCGCGGCACCGACCCACTAGCCTACGTCGACCCGTATCT6CCGATCG 
AGAGCTTCCGTCGCCGCGACGAAGCCCGCAATGGTCTGGTGGCACTGGTTGCGGACATCATGA 
ACGGCCGGATCGCCAACCCACCCACCGACAAGAGCGACCGTGACATGCTCGACGTGCTCATCG 
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CCGTCAAGGCTGAGACCGGCACTCCCCQGTTCTCGGCCGACGAGATCACCGGCATGTTCATCT 
CGATGATGTTCGCCGGCCATCACACCAGCTCGGGTACGGCTTCGTGGACGCTGATCGAGTTGA 
TGCGCCATCGCGACGCCTACGCGGCCGTGATCGACGAACTCGACGAGCTGTACGGCGACGGC 
CGATCGGTGAGTTTCCATGCGCTGCGCCAGATTCCGCAGCTGGAAAACGTGCTGAAAGAGACG 

5 CTGCGCCTGCACCCTCCGCTGATCATCCTCATGCGAGTGGCCAAGGGCGAGTTCGAGGTGCAA 
GGCCACCGGATTCATGAGGGCGATCTGGTGGCGGCCTCCCCGGCGATCTCCAACCGGATCCCC 
GAAGACTTCCCCGATCCCCACGACTTCGTGCCAGCACGATACGAGCAGCCGCGCCAGGAAGAT 
CTGCTCAACCGCTGGACGTGGATTCCGTTCGGCGCCGGCCGGCATCGTTGCGTGGGGGCGGC 
GTTCGCCATCATGCAGATCAAAGCGATCTTCTCGGTGTTGTTGCGCGAGTATGAGTTTGAGATG 

10 GCGCAACCGCCAGAAAGCTATCGTAACGACCATTCGAAGATGGTGGTGCAGTTGGCCCAGCCC 
GCTTGCGTGCGCTACCGCCGGCGAACGGGAGTTTAA 

>Rv0861c - DNA helicase TB.seq 958524:960149 MW:59773 
>emb|AL123456|MTBH37RV:c960149-958521. Rv0861cSEQ ID NO:34 

15 GTGCAGTCCGATAAGACGGTGCTGTTGGAAGTCGACCATGAACTGGCCGGCGCTGCACGCGCC 
GCCATCGCGCCGTTCGCCGAGCTGGAACGTGCACCCGAACATGTCCACACCTACCGCATCACA 
CCGCTGGCACTGTGGAATGCTCGCGCCGCCGGCCATGATGCCGAGCAAGTCGTCGACGCGCT 
GGTCAGTTACTCCCGCTACGCGGTGCCGCAACCCTTGCTCGTCGACATCGTCGACACCATGGC 
CCGCTACGGACGACTGCAGTTGGTCAAGAACCCGGCCCATGGCCTGACGCTGGTGAGCCTGGA 

20 CCGCGCGGTGCTTGAGGAAGTGCTGCGCAACAAGAAGATCGCGCCGATGCTTGGCGCCCGCAT 
CGATGACGACACCGTCGTCGTCCACCCCAGCGAACGCGGCCGGGTCAAGCAGCTGCTGCTCAA 
GATCGGTTGGCCCGCAGAGGATCTCGCCGGCTACGTCGATGGTGAAGCGCACCCGATCAGCCT 
GCACCAGGAGGGCTGGCAGCTGCGCGATTACCAGCGGCTGGCCGCX3GACTCGTTCTGGGCGG 
GCGGCTCCGGGGTGGTGGTGCTGCCATGTGGGGCCGGCAAGACGCTGGTCGGTGCGGCCGC 

25 AATGGCCAAAGCCGGCGCGACGACGTTGATCCTGGTCACCAATATCGTCGCGGCCCGGCAATG 
GAAACGAGAGCTGGTCGCGCGCACCTCGCTCACCGAGAATGAGATCGGCGAATTCTCGGGAGA 
ACGCAAGGAAATCCGACCTGTCACCATCTCGACATACCAGATGATCACCCGCCGCACTAAGGGC 
GAGTACCGCCATCTGGAACTGTTCGACAGCCGCGACTGGGGGCTCATCATCTATGACGAGGTG 
CACCTGTTGCCGGCACCGGTCTTCCGGATGACCGCTGACCTGCAGTCCAAACGGCGGCTGGGG 

30 CTGACCGCCACGTTGATCCGTGAAGACGGACGCGAGGGCGACGTGTTTTCCCTTATCGGACCA 
AAGCGCTATGACGCGCCGT6GAAGGACATTGAGGCGCAGGGCTGGATCGCGGCAGCTGAGTG 
CGTGGAAGTCCGGGTCACGATGACCGACAGCGAGCGGATGATGTACGCCACCGCCGAACCCG 
AAGAACGCTACCGGATCTGCTCGACGGTGCACACCAAAATTGCTGTGGTCAAGTCGATTCTGGC 
GAAGCACCCGGATGAGCAGACCCTGGTCATCGGAGCGTACTTGGATCAGCTCGACGAGCTGGG 

35 CGCCGAGCTCGGCGCTCCGGTGATTCAGGGGTCGACAAG6ACCAGCGAACGCGAGGCACTGT 
TCGACGCCTTCCGCCGCGGCGAGGTCGCTACGCTCGTGGTGTCCAAGGTGGCTAACTTCTCCA 
TCGACTTGCCGGAAGCCGCCGTGGCGGTACAGGTTTCGGGAACATTCGGCTCACGCCAGGAAG 
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AGGCGCAACGGCTCGGCCGGATATTGCGACCCAAGGCCGACGGGGGCGGTGCCATCTTCTAC 
TCGGTGGTGGCCCGCGACAGCCTGGATGCCGAGTACGCCGCACACCGGCAGCGG F TTTTA6CT 
GAGCAGGGCTACGGTTACATCATCCGCGACGCCGACGACCTGCTGGGCCCGGCAATTTAG 

5 >Rv0904c accD3 TB.seq 1006694:1008178 MW:51741 

>emb|AL123456|MTBH37RV:c1008178-1006691 # accD3 SEQ ID NO:35 

GTGAGTCGTATCACGACCGACCAACTGCGGCACGCGGTGCTAGACCGGGGATCTTTCGTCAGC 
TGGGATAGCGAGCCGCTGGCGGTGCCGGTAGCCGACTCCTATGCGCGGGAGCTGGCCGCCGC 
TCGGGCGGCCACCGGCGCGGACGAATCGGTGCAGACCGGTGAGGGACGCGTATTCGGGCGG 

10 CGGGTGGCCGTGGTGGCCTGTGAGTTCGACTTCCTGGGCGGCTCGATTGGGGTGGCAGCGGC 
CGAACGGATCACCGCCGCCGTCGAGCGGGCGACCGCCGAGCGGCTGCCGCTACT6GCGTCAC 
CAAGCTCGGGAGGCACCCGCATGCAAGAAGGCACGGTCGCGTTTCTGCAGATGGTGAAGATCG 
CTGCGGCCATCCAGCTGCACAACCAGGCGCGCCTGCCCTACCTGGTCTATTTGCGCCATCCGA 
CCACGGGTGGAGTTTTCGCGTCGTGGGGCTCGCTGGGGCATCTCACCGTCGCCGAGCCGGGC 

15 GCCCTGATCGGCTTTCTGGGACCACGGGTCTATGAGTTGCTCTATGGCGACCCCTTCCCATCCG 
GCGTCCAAACCGCCGAGAATCTACGGCGGCATGGGATCATCGACGGCGTCGTTGCACTGGACC 
GGCTACGACCGATGCTGGATCGTGCGTTGACGGTGCTCATCGACGCTCCCGAACCGCTTCCGG 
CACCGCAGACGCCCGCGCCCGTACCCGATGTGCCCACGTGGGACTCGGTGGTGGCATCGCGC 
CGGCCGGACCGGCCGGGCGTCAGGCAGCTACTGCGACACGGCGCCACCGACCGGGTGTTGTT 

20 GTCAGGAACCGATCAAGGCGAAGCGGCGACCACGCTGCTGGCGCTGGCCCGCTTTGGCGGCC 
AACCCACGGTGGTCCTCGGCCAGCAAAGGGCAGTAGGCGGCGGGGGAAGCACTGTCGGGCCC 
GCTGCGTTACGCGAAGCCCGACGCGGGATGGCGCTCGCCGCCGAGCTGTGCCTGCCGCTGGT 
GCTGGTCATTGACGCGGCCGGACCCGCGTTGTCGGCCGCAGCCGAACAGGGCGGGCTGGCCG 
GCCAGATCGCGCATTGCCTGGCCGAGCTCGTCACGCTGGATACCCCGACCGTGTCGATCCTGC 

25 TGGGCCAGGGCAGCGGCGGGCCGGCGCTGGCGATGTTGCCCGCCGACCGGGTGCTGGCCGC 
ACTCCACGGCTGGCTGGCGCCCTTGCCTCCCGAAGGAGCCAGCGCGATCGTGTTCCGAGACAC 
TGCTCATGCCGCCGAACTCGCTGCCGCCCAAGGCATCCGGTCGGCCGACCTACTGAAGTCGGG 
GATTGTCGACACCATCGTGCCGGAGTACCCCGACGCCGCAGACGAGCCGATCGAGTTCGCCCT 
ACGACTGTCGAACGCCATCGCCGCCGAAGTGCAGGCGTTACGGAAGATACCGGCCCCGGAACG 

30 CCTCGCGACTCGGTTGCAACGCTACCGCCGGATCGGGTTGCCCCGCGACTAA 

>Rv0983 - TB.seq 1099064:1 100455 MW:46454 

>emb|AL123456|MTBH37RV:1099064-1100458, Rv0983 SEQ ID NO:36 

ATGGCCAAGTTGGCCCGAGTAGTGGGCCTAGTACAGGAAGAGCAACCTAGCGACATGACGAAT 
36 CACCCACGGTATTCGCCACCGCCGCAGCAGCCGGGAACCCGAGGTTATGCTCAGGGGCAGCA 
GCAAACGTACAGCCAGCAGTTCGACTGGCGTTACCCACCGTCCCCGCCCCCGCAGCCAACCCA 
GTACCGTCAACCCTACGAGGCGTTGGGTGGTACCCGGCCGGGTCTGATACCTGGCGTGATTCC 
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GACCATGACGCCCCCTCCTGGGATGGTTCGCCAACGCCCTCGTGCAGGCATGTTGGCCATCGG 
CGCGGTGACGATAGCGGTGGTGTCCGCCGGCATCGGCGGCGCGGCCGCATCCCTGGTCGGGT 
TCAACCGGGCACCCGCCGGCCCCAGCGGCGGCCCAGTGGCTGCCAGCGCGGCGCCAAGCAT 
CCCCGCAGCAAACATGCCGCCGGGGTCGGTCGAACAGGTGGCGGCCAAGGTGGTGCCCAGTG 
5 TCGTCATGTTGGAMCCGATCTGGGCCGCCAGTCGGAGGAGGGCTCCGGCATCATTCTGTCTG 
CCGAGGGGCTGATCTTGACCAACAACCACGTGATCGCGGCGGCCGCCAAGCCTCCCCTGGGC 
AGTCCGCCGCCGAAAACGACGGTAACCTTCTCTGACGGGCGGACCGCACCCTTCACGGTGGTG 
GGGGCTGACCCCACCAGTGATATCGCCGTCGTCCGTGTTCAGGGCGTCTCCGGGCTCACCCCG 
ATCTCCCTGGGTTCCTCCTCGGACCTGAGGGTCGGTCAGCCGGTGCTGGCGATCGGGTCGCCG 

10 CTCGGTTTGGAGGGCACCGTGACGACGGGGATCGTCAGCGCTCTCAACCGTCCAGTGTCGACG 
ACCGGCGAGGCCGGCAACCAGAACACCGTGCTGGACGCCATTCAGACCGACGCCGCGATCAA 
CCCCGGTAACTCCGGGGGCGCGCTGGTGAACATGAACGCTCAACTCGTCGGAGTCAACTCGGC 
CATTGCCACGCTGGGCGCGGACTCAGCCGATGCGCAGAGCGGCTCGATCGGTCTCGGTTTTGC 
GATTCCAGTGGACCAGGCCAAGCGCATCGCCGACGAGTTGATCAGCACCGGCAAGGCGTCACA 

15 TGCCTCCCTGGGTGTGCAGGTGACCAATGACAAAGACACCCTGGGCGCCAAGATCGTCGAAGT 
AGTGGCCGGTGGTGCTGCCGCGAACGCTGGAGTGCCGAAGGGCGTCGTTGTCACCAAGGTCG 
ACGACCGCCCGATCAACAGCGCGGACGCGTTGGTTGCCGCCGTGCGGTCCAAAGCGCCGGGC 
GCCACGGTGGCGCTAACCTTTCAGGATCCCTCGGGGGGTAGCCGCACAGTGCAAGTCACCCTC 
GGCAAGGCGGAGCAGTGA 

20 

>Rv1008 - Similar to E.coli protein YcfH TB.seq 1127087:1 127878 MW:29066 
>emb|AL123456|MTBH37RV:1 127087-1 127881, Rv1008 SEQ ID NO:37 

TTGGTCGACGCCCACACCCATCTCGACGCGTGCGGTGCACGAGACGCCGATACGGTGCGGTC 
GCTCGTCGAGCGAGCCGCCGCGGCCGGCGTGACCGCGGTGGTCACCGTCGCCGACGACCTG 

25 GAGTCCGCGCGCTGGGTCACCCGCGCGGCCGAATGGGATCGGCGAGTCTATGCCGCGGTGGC 
GTTGCACCCGACCCGCGCCGATGCGCTCACCGACGCTGCCCGTGCCGAGCTCGAGCGATTGG 
TTGCCCACCCCAGGGTGGTGGCCGTCGGTGAGACCGGAATCGACATGTACTGGGCGGGTCGC 
CTGGACGGGTGTGCGGAGCCGCACGTCCAGCGGGAGGCCTTTGCCTGGCATATCGATCTGGC 
CAAGCGGACCGGTAAACCGCTGATGATCCACAATCGTCAGGCCGACCGCGACGTGCTGGACGT 

30 GCTGCGGGCCGAGGGCGCGCCGGACACCGTGATCTTGCACTGCTTCTCGTCGGACGCGGCGA 
TGGCCCGCACGTGTGTGGACGCCGGGTGGCTGCTCAGCCTGTCCGGGACGGTGAGCTTCCGT 
ACCGCCCGTGMCTACGGGAAGCCGTCCCGCTGATGCCGGTGGAGCAGCTTTTGGTGGAAACC 
GATGCACCGTATTTGACCCCGCATCCCCACCGGGGCTTGGCGAACGAACCGTACTGCCTGCCC 
TATACCGTGCGGGCGCTGGCTGAACTGGTCAATCGGCGCCCCGAAGAGGTGGCGCTCATCACC 

35 ACAAGCAACGCTCGCCGAGCTTATGGGCTAGGGTGGATGCGCCAATGA 
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>Rv1009 - lipoprotein, similar to various other MTB proteins TB.seq 1 1 28089: 1 1 291 74 MW:38079 
>emb|AL123456|MTBH37RV:1128089-1129177. Rv1009 SEQ ID NO:38 

ATGTTGCGCCTGGTAGTCGGTGCGCTGCTGCTGGTGTTGGCGTTC6CCGGTGGCTATGCGGTC 
GCCGCATGCAAAACGGTGACGTTGACCGTCGACGGAACCGCGATGCGGGTGACCACGATGAAA 
5 TCGCGGGTGATCGACATCGTCGAAGAGAACGGGTTCTCAGTCGACGACCGCGACGACCTGTAT 
CCCGCGGCCGGCGTGCAGGTCCATGACGCCGACACCATCGTGCTGCGGCGTAGCCGTCCGCT 
GCAGATCTCGCTGGATGGTCACGACGCTAAGCAGGTGTGGACGACCGCGTCGACGGTGGACG 
AGGCGCTGGCCCAACTCGCGATGACCGACACGGCGCCGGCCGCGGCTTCTCGCGCCAGCCGC 
GTCCCGCTGTCCGGGATGGCGCTACCGGTCGTCAGCGCCAAGACGGTGCAGCTCAACGACGG 

10 CGGGTTGGTGCGCACGGTGCACTTGCCGGCCCCCAATGTCGCGGGGCTGCTGAGTGGGGCCG 
GCGTGCCGCTGTTGCAAAGCGACCACGTGGTGCCCGCCGCGACGGCCCCGATCGTCGAAGGC 
ATGCAGATCCAGGTGACCCGCAATCGGATCAAGAAGGTCACCGAGCGGCTGCCGCTGCCGCCG 
AACGCGCGTCGTGTCGAGGACCCGGAGATGAACATGAGCCGGGAGGTCGTCGAAGACCCGGG 
GGTTCCGGGGACCCAGGATGTGACGTTCGCGGTAGCTGAGGTCAACGGCGTCGAGACCGGCC 

1 5 GTTTGCCCGTCGCCAACGTCGTGGTGACCCCGGCCCACGAAGCCGTGGTGCGGGTGGGCACC 
AAGCCCGGTACCGAGGTGCCCCCGGTGATCGACGGAAGCATCTGGGACGCGATCGCCGGCTG 
TGAGGCCGGTGGCAACTGGGCGATCAACACCGGCAACGGGTATTACGGTGGTGTGCAGTTTGA 
CCAGGGCACCTGGGAGGCCAACGGCGGGCTGCGGTATGCACCCCGCGCTGACCTCGCCACCC 
GCGAAGAGCAGATCGCCGTTGCCGAGGTGACCCGACTGCGTCAAGGTTGGGGCGCCTGGCCG 

20 GTATGTGCTGCACGAGCGGGTGCGCGCTGA 

>Rv101 0 ksgA 16S rRNA dimethyltransferase TB.seq 1 129150:1 130100 MW:34647 
>emb|AL123456|MTBH37RV:1129150-1130103, ksgA SEQ ID NO:39 

ATGTGCTGCACGAGCGGGTGCGCGCTGACCATCCGGCTGCTCGGGCGCACTGAGATCAGGCG 
25 GCTGGCCAAAGAGCTCGACTTTCGGCCGCGCAAATCTCTCGGACAGAACTTCGTGCACGACGC 
CAACACGGTGCGACGGGTGGTTGCCGCCTCCGGGGTCAGCCGTTCCGACCTGGTTTTGGAGGT 
CGGGCCGGGCCTGGGATCGCTGACCCTGGCACTGCTCGACCGCGGCGCGACCGTCACCGCGG 
TCGAGATCGATCCACTACTGGCTTCTCGGCTGCAACAGACCGTGGCGGAGCACTCGCACAGCG 
AGGTTCACCGACTAACGGTGGTCAATCGCGACGTCCTGGCCCTGCGCCGGGAGGATCTAGCCG 
30 CGGCGCCGACCGCGGTGGTTGCCAATCTGCCGTACAACGTAGCGGTACCGGCGTTGTTGCATC 
TGCTTGTCGAGTTCCCGTCGATCCGTGTCGTGACGGTGATGGTGCAGGCCGAGGTCGCCGAAC 
GGCTCGCCGCCGAGCCGGGCAGCAAAGAGTACGGCGTGCCCAGCGTTAAGCTGCGCTTCTTC 
GGGCGGGTTCGCCGCTGCGGCATGGTGTCGCCGACCGTTTTCTGGCCCATTCCGCGTGTCTAT 
TCCGGGCTGGTACGCATCGATCGATATGAGACCTCGCCCTGGCCCACCGACGACGCTTTTCGA 
35 CGGCGGGTATTCGAACTCGTGGACATCGCATTCGCGCAGCGGCGCAAGACTTCTCGCAACGCG 
TTTGTGCAGTGGGCGGGCTCGGGAAGCGAGTCGGGGAATCGATTGTTGGCGGCCAGCATCGAC 
CCCGCCCGTCGCGGTGAGACGCTGTCCATCGACGACTTCGTGCGGCTGCTGCGACGGTCCGG 
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CGGCTCCGACGAGGCCACCA6CACCGGCCGGGACGCCAGGQCGCCGGACATTTCGGGGCAC 
GCGTCGGCGAGCTGA 

>Rv1011- Homology to E.coli protein YcbH TB.seq 1130189:1131106 MW:31350 
>emb|AL123456|MTBH37RV:1130189-1131109. Rv1011 SEQ ID NO:40 

GTGCCCACCGGGTCGGTCACCGTTCGGGTGCCCGGAAAGGTCAACCTCTATCTGGCGGTCGGC 

GATCGCCGCGAGGACGGCTATCACGAGCTGACCACGGTATTTCATGCCGTCTCGCTGGTCGAC 

GAGGTAACCGTTCGTAACGCTGATGTGCTCTCGCTCGAGTTGGTCGGCGAGGGGGCCGACCAG 

CTGCCGACCGACGAACGCAATCTCGCCTGGCAGGCGGCCGAGCTGATGGCCGAACACGTGGG 

CCGGGCGCCGGACGTCTCGATCATGATCGACAAATCCATTCCGGTCGCCGGCGGCATGGCCG 

GTGGCAGCGCGGACGCTGCGGCGGTCCTGGTTGCGATGMCTCGTTGTGGGAACTCAATGTGC 

CCCGCCGCGACCTGCGCATGCTCGCCGCGCGGCTAGGCAGCGATGTGCCGTTTGCCCTGCAT 

GGTGGTACCGCGCTGGGGACGGGTCGCGGCGAGGAGTTGGCCACCGTGTTATCCCGCAACAC 

CTTCGACTGGGTCCTGGCGTTCGCCGACAGCGGGTTGCTCACCTCCGCGGTGTACAACGAGCT 

CGACCGGCTCAGGGAGGTGGGGGATCCGCCCCGGCTTGGTGAGCCCGGGCCGGTTCTGGCTG 

CCTTAGCTGCGGGTGATCCGGATCAGCTGGCGCCGTTGCTGGGTAATGAAATGCAAGCGGCCG 

CGGTGAGCCTGGACCCGGCGCTGGCTCGTGCGTTACGCGCCGGTGTGGAGGCCGGCGCGCTC 

GCAGGCATCGTGTCCGGTTCGGGTCCCACGTGTGCCTTCCTGTGCACCTCGGCGAGCTCGGCG 

ATCGATGTCGGCGCGCAGCTGTCGGGGGCGGGAGTTTGTCGCACCGTTCGAGTCGCCACCGG 

GCCGGTACCCGGCGCCCGCGTGGTGTCTGCGCCGACCGAAGTGTGA 

>Rv1 1 06c - cholesterol dehydrogenase TB.seq 1232845: 1 233954 MW:40743 
>emb|AL123456|MTBH37RV:c1233954-1232842,Rv1l06c SEQ ID NO:41 

ATGCTTCGCCGCATGGGTGATGCATCGCTGACAACCGAGCTCGGCCGCGTTCTGGTCACCGGC 

GGCGCGGGCTTCGTGGGCGCCAACCTGGTGACCACCTTGCTGGACCGCGGGCACTGGGTGCG 

TTCCTTCGACCGCGCGCCGTCGCTGTTGCCTGCGCATCCGCAACTGGAGGTGCTGCAAGGGGA 

CATCACCGACGCGGACGTCTGCGCCGCGGCCGTGGACGGCATGGACACGATCTTCCACACCG 

CAGCGATCATCGAGCTGATGGGCGGCGCGTCGGTCACCGACGAGTACCGCCAACGTAGCTTTG 

CGGTCAACGTCGGCGGCACCGAGAACCTGCTGCACGCCGGCCAGCGGGCCGGGGTGCAGCG 

GTTCGTCTACACGTCATCCAACAGTGTGGTGATGGGCGGCCAGAACATCGCCGGCGGTGACGA 

GACGCTGCCCTATAGGGACGGGTTCAACGACCTCTACACCGAGACCAAGGTGGTTGCCGAGCG 

ATTCGTGTTGGCCCAGAACGGTGTCGACGGCATGCTGACGTGCGCGATCCGGCCCAGCGGCAT 

CTGGGGAAACGGCGATCAGACGATGTTCCGCAAGCTGTTCGAAAGTGTGCTCAAGGGCCACGT 

CAAGGTGCTGGTCGGGCGCAAGTCGGCCCGGCTGGATAACTCTTACGTGCACAAGCTGATTCA 

CGGTTTCATCTTGGCCGCTGCCCATCTGGTGCCGGACGGCACAGCGCCCGGGCAGGCTTACTT 

CATCAACGACGCAGAGCCGATCAATATGTTCGAGTTCGCTCGGCCGGTGCTCGAGGCGTGCGG 

GCAGCGCTGGCCGAAGATGCGGATTTCCGGCCCCGCGGTCCGCTGGGTAATGACGGGGTGGC 
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AGCGGCTGCACTTCCGGTTCGGATTCCCCGCGCCGCTGCTCGAGCCGCTGOCCGTCGAACGAC 
TGTACCTGGACMCTACTTTTCGATCGCTAAGGCACGCCGCGACCTGGGCTATGAGCCGCTGTT 
CACCACCCAGCAGGCGCTGACCGAATGCCTGCGGTACTACGTGAGTCTGTTTGAGCAGATGAA 
GAACGAGGCCCGGGCGGAAAAAACGGCCGCCACAGTCAAGCCGTAG 

5 

>Rv1110 lytB2TB.seq 1236183:1237187 MW:36298 
>emb|AL123456|MTBH37RV:l236183-1237190, lytB' SEQ ID NO:42 

ATGGTTCCGACGGTCGACATGGGGATTCCCGGGGCTTCGGTATCGTCGCGATCGGTGGCCGAC 
CGTCCCAACCGTAAGCGGGTGCTGCTGGCCGAGCCGCGTGGCTACTGCGCTGGCGTGGATCG 

10 GGCCGTCGAAACGGTCGAACGCGCGCTTCAAAAACACGGCCCGCCTGTCTACGTGCGTCACGA 
GATCGTGCATMCCGCCACGTGGTTGACACCCTGGCTMGGCCGGTGCGGTTTTCGTCGAAGA 
GACCGAGCAGGTTCCCGAGGGAGCGATTGTGGTGTTCTCCGCGCACGGGGTCGCGCCTACGG 
TGCACGTCAGCGCCAGCGAGCGCAACCTGCAGGTCATTGACGCCACCTGCCCGCTGGTCACCA 
AGGTGCACMCGAGGCCAGGCGGTTCGCCCGGGACGACTACGACATCTTGCTGATCGGTCATG 

15 AGGGCCACGAGGAAGTCGTCGGTACTGCTGGGGAAGCTCCCGATCATGTGCAGCTGGTCGACG 
GGGTGGACGCCGTCGACCAGGTGACCGTCCGTGACGAGGACAAAGTGGTTTGGCTGTCGCAG 
ACCACCCTGTCCGTCGATGAGACCATGGAGATTGTCGGGCGGTTGCGTCGGCGTTTCCCCAAG 
CTGCAGGATCCGCCCAGCGACGACATCTGCTATGCGACCCAGAATCGGCAGGTCGCGGTCAAG 
GCGATGGCGCCCGAGTGCGAGCTGGTCATCGTGGTCGGCTCGCGCAATTCGTCGAATTCGGTT 

20 CGGCTGGTCGAGGTGGCGCTGGGTGCCGGGGCGCGGGCCGCCCACCTGGTGGACTGGGCCG 
ACGATATCGACTCGGCCTGGCTGGACGGCGTTACCACGGTCGGCGTTACGTCGGGGGCATCGG 
TCCCCGAGGTGCTGGTGCGCGGTGTGCTGGAGCGGCTGGCCGAATGCGGCTACGACATCGTG 
CAAGCGGTGACAACGGCCAACGAGACGTTGGTGTTCGCATTGCCCCGGGAGCTCCGCTCACCT 
CGCTGA 

25 

>Rv1216c - TB.seq 1359473:1360144 MW:24863 

>emb|AL123456|MTBH37RV:d 360144-1 359470, Rvl216c SEQ ID NO:43 

ATGCACATTGGGCTGMGATATTCATATGGGGCGTGTTAGGACTCGTCGTTTTCGGCGCGCTCC 
TATTCGGGCCAGCCGGCACGTTCGACTATTGGCAGGCGTGGGTGTTCCTCGCCGCATTTGTGA 

30 GCACCACGATTGGCCCCACAATCTATCTGGCTCGCAACGATCCCGCGGCCCTTCAACGTCGCAT 
GCGCAGCGGTCCGCTCGCGGAGGGCCGMCGATTCAGMGTTCATCGTCATCGGCGCTTTTCT 
GGGGTTCTTCGCGATGATGGTGCTGAGCGCGTGCGACCATCGTTATGGTTGGTCGTCAGTGCC 
AGCCGCGGTGTGCGTGATCGGCGACGTCCTAGTGATGACGGGCCTTGGGATCGCCATGCTGGT 
GGTCATCCAGAACAGGTATGCCGCCTCGACGGTCAGGGTGGAGGCGGGCCAGATATTGGCCTC 

35 CGACGGTCTCTACAAAATTGTCCGACACCCGATGTACGCCGGGAACGTGGTCATGATGACAGG 
CATACCGCTGGCACTGGGCTCTTACTGGGCGATGTTCATCCTCGTCCCCGGCACACTGGTGTTG 
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GTGTTCCGCATCCTCGACGA6GAAAAACTACTGACGCAAGAACTCAGCGGGTACCGCGAATACC 
GGCAACTGGTGCGCTACCGGTTGGTGCCCTACGTGTGGTAG 

>Rv1223 htrATB.seq 1365810:1367456 MW:56547 
5 >emb|AL123456|MTBH37RV:1365810-1367459 t htrA SEQ ID NO:44 

GTGAGCCACTTGTCGCAGCGCATGGCGGGGTTGCTGCGAGTTCATGGCGAGTGGTCGCGATCC 
GTGGATACTAGGGTGGACACGGACMCGCGATGCCTGCACGTTTTAGCGCCCAGATTCAGAAT 
GAGGATGAGGTGACCTCCGACCAAGGCAACAACGGCGGCCCGAACGGCGGAGGCCGCCTGGC 
GCCGCGCCCGGTTTTTCGGCCACCGGTCGACCCGGCGTCGCGTCAAGCGTTCGGGCGTCCGT 

10 CCGGGGTCCAAGGGTCCTTTGTGGCCGAGCGTGTGCGCCCGCAGAAGTACCAGGACCAGTCT 
GACTTCACACCGMCGATCAGCTTGCTGACCCGGTGCTTCAGGAGGCGTTCGGTCGTCCGTTC 
GCGGGCGCCGAATCGCTGCAGCGCCATCCCATCGATGCCGGAGCGCTGGCAGCTGAGAAAGA 
CGGT6CCGGCCCCGACGAGCCCGACGATCCGTGGCGCGACCCCGCGGCCGCGGCCGCGCTG 
GGGACGCCAGCGCTAGCCGCGCCGGCACCGCACGGTGCGCTGGCCGGCAGCGGCAAGCTGG 

15 GTGTGCGCGACGTGCTGTTTGGCGGCAAGGTGTCCTACTTGGCGCTGGGCATCTTGGTCGCTA 
TCGCACTGGTGATCGGCGGCATCGGCGGTGTCATCGGCCGCAAGACCGCGGAAGTAGTCGAT 
GCGTTCACCACGTCGAAGGTGACCCTGTCGACCACTGGCAATGCCCAGGAACCGGCCGGCCG 
GTTCACCAAGGTGGCGGCCGCCGTGGCCGATTCGGTGGTGACCATTGAGTCGGTCAGCGACCA 
GGAGGGCATGCAAGGTTCCGGCGTCATCGTCGATGGCCGCGGCTACATCGTCACCAACAATCA 

20 CGTGATCTCTGAGGCGGCCAACAATCCCAGCCAGTTCAAGACGACCGTGGTGTTCAACGACGG 
CAAGGAGGTGCCCGCCAATCTGGTGGGTCGTGACCCCAAGACCGACTTGGCCGTCCTCAAGGT 
CGACAACGTCGACAATCTGACCGTGGCCCGGCTCGGTGATTCCAGCAAGGTACGGGTCGGTGA 
CGAAGTCCTCGCGGTCGGCGCGCCCCTGGGGCTGCGCAGTACGGTGACCCAGGGCATTGTCA 
GCGCGCTACACCGCCCCGTTCCGTTGTCGGGCGAGGGCTCTGACACCGACACCGTCATTGACG 

25 CAATTCAGACCGACGCCTCGATCAACCACGGTAACTCCGGCGGTCCGCTAATCGACATGGATGC 
CCAGGTGATTGGCATCAACACCGCCGGTAAGTCACTGTCGGATAGCGCCAGCGGGCTGGGCTT 
TGCGATCCCGGTCAACGAGATGAAATTGGTGGCAAATTCTCTGATCAAAGACGGAAAGATCGTG 
CATCCGACGTTGGGCATCAGCACCCGGTCAGTAAGCAACGCGATCGCGTCGGGCGCGCAGGT 
GGCCAATGTAAAGGCGGGAAGTCCCGCGCAGAAGGGCGGGATCTTGGAGAACGATGTGATCGT 

30 CAAGGTCGGTAACCGCGCGGTCGCCGACTCCGACGAGTTCGTCGTCGCCGTGCGCCAGTTGG 
CTATCGGCCAGGACGCTCCGATAGAGGTGGTCCGCGAGGGTCGGCATGTGACGCTGACGGTG 
AAACCGGACCCCGATAGCACCTAG 

>Rv1224 -TB.seq 1367461:1367853 MW:14083 
35 >emb|AL123456|MTBH37RV:1367461-1367856, Rv1224 SEQ ID NO:45 

GTGTTCGCCAACATCGGTTGGTGGGAAATGCTCGTCCTCGTCATGGTCGGGCTGGTGGTGCTT 
GGCCCGGAGCGGCTCCCGGGTGCCATCCGCTGGGCGGCAAGCGCTCTGCGGCAGGCGCGCG 
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ACTATCTCAGCGGTGTGACCAGCXSAGCTACGTGAGGACATTGGACCCGAATTCGATGATCTGCG 
GGGACATCTCGGTGAGCTGCAGAAGCTACGGGGAATGACTCCGCGGGCTGCGTTGACCAAGCA 
CCTACTGGATGGCGATGATTCGCTGTTCACCGGAGACTTCGACCGACCGACGCCGAAGAAACC 
GGATGCGGCGGGCTCGGCGGGGCCGGACGCTACTGAGCAGATCGGTGCGGGGCCCATCCCG 
6 TTTGACAGCGATGCCACCTAG 

>Rv1229c mrp similar to MRP/NBP35 ATP-binding proteins TB.seq 1371778:1372947 MW:41064 
>emb|AL123456|MTBH37RV:c1372947-1371775, mrp SEQ ID NO:46 

ATGCCAAGCCGCCTACACTCGGCGGTGATGTCCGGAACTCGTGATGGCGACCTGAACGCGGCG 

10 ATACGCACCGCGCTGGGCAAGGTAATCGACCCCGAATTGCGGCGCCCCATCACCGAACTGGGG 
ATGGTCAAAAGCATCGACACCGGCCCGGATGGGAGCGTGCACGTCGAGATCTACCTGACCATC 
GCCGGCTGCCCGAAGAAGTCCGAAATCACCGAGCGTGTCACCCGGGCGGTCGCCGACGTGCC 
AGGCACTTCGGCGGTGCGGGTCAGCTTGGACGTGATGAGCGACGAGCAGCGCACCGAGCTGC 
GTAAGCAGTTGCGTGGCGATACCCGCGAACCCGTCATCCCGTTCGCGCAACCCGA7TCGTTGAC 

15 CCGGGTGTATGCCGTGGCTTCCGGTAAGGGCGGAGTCGGAAAGTCCACCGTCACGGTCAACCT 
GGCCGCCGCGATGGCCGTCCGGGGCCTGTCGATCGGGGTGCTGGACGCTGATATCCACGGCC 
ACTCTATCCCCCGGATGATGGGCACCACCGACCGGCCTACCCAGGTTGAGTCGATGATCCTGC 
CGCCGATCGCCCACCAGGTGAAGGTCATCTCGATAGCCCAGTTCACCCAGGGCAACACCCCGG 
TGGTGTGGCGCGGGCCGATGCTGCACCGGGCGTTGCAGCAGTTTCTGGCCGACGTGTACTGG 

20 GGGGATCTGGACGTGCTGCTGCTGGACTTGCCGCCCGGAACCGGCGACGTCGCCATCTCGGT 
GGCTCAACTGATCCCCAACGCCGAACTCCTGGTGGTCACCACCCCGCAGCTGGCCGCCGCGGA 
GGTGGCCGAACGGGCCGGCAGCATCGCGCTGCAAACCCGCCAACGCATCGTCGGCGTCGTGG 
AGAACATGTCGGGGCTCACGCTGCCGGACGGCACCACGATGCAGGTGTTCGGCGAGGGCGGT 
GGCCGGCTGGTCGCCGAGCGGTTGTCGCGTGCGGTCGGCGCCGACGTGCCGCTGCTGGGTCA 

25 GATCCCGCTGGACCCCGCACTGGTGGCCGCCGGCGATTCGGGCGTACCGCTCGTGTTGAGCT 
CGCCGGACTCGGCGATCGGCAAGGAACTGCATAGCATCGCCGACGGCTTGTCGACTCGACGAC 
GGGGATTGGCGGGCATGTCGCTGGGGTTGGACCCGACACGACGCTAG 

>Rv1 239c corA magnesium and cobalt transport protein TB.seq 1381943:1 383040 MW:41470 

30 >emb|AL123456|MTBH37RV'.c1383040-1381940, corA SEQ ID NO:47 

GTGTTCCCAGGGTTTGACGCATTGCCCGAAGTGCTGCGACCGGTCGCGCGACCCCAGCCGCCG 
AACGCACACCCCGTTGCCCAGCCACCGGCCCAAGCCTTGGTCGACTGCGGTGTCTACGTCTGC 
GGCCAGCGACTGCCCGGCAAGTACACCTACGCCGCCGCGCTGCGCGAGGTGCGCGAGATCGA 
ACTGACCGGGCAGGAGGCGTTCGTCTGGATCGGGCTGCACGAGCCCGATGAAAACCAGATGCA 

35 GGACGTAGCAGACGTTTTCGGGTTGCACCCGTTAGCCGTTGAGGACGCCGTGCACGCGCACCA 
GCGACCCAAGTTGGAGCGCTACGACGAGACGCTGTTCCTCGTCCTCAAGACCGTCAACTACGT 
CCCGCACGAATCGGTGGTACTGGCCCGCGAGATCGTCAAAACCGGCGAGATCATGATCTTCGT 



80 



WO 01/35317 



PCT/US00/31152 



CGGCAAGGATTTCGTGGTCACCGTCCGCCACGGCGAACACGGCGGGTTATCCGAGGTGCGTAA 
GCGGATGGATGCCGACCCCGAACATTTGCGGTTGGGACCGTATGCGGTGATGCACGCGATCGC 
CGACTACGTGGTCGACCACTACCTCGAGGTGACCAATCTCATGGAGACCGATATCGACAGCATC 
GAGGAAGTAGGGTTCGCGCCGGGCCGCAAGCTCGACATCGAACCGATCTATCTGCTCAAGCGG 
5 GAAGTGGTCGAGTTGCGCCGGTGCGTGAATCCGCTATCGACCGCATTCCAGCGCATGCAGACC 
GAGAGCAAAGACCTCATTTCGAAAGAAGTGCGGCGCTACCTGCGCGACGTCGCCGACCACCAG 
ACCGAGGCCGCCGACCAGATCGCCAGCTACGACGACATGCTCAACTCGCTGGTGCAGGCCGC 
GCTCGCCCGGGTCGGCATGCAGCAAAACATGGACATGCGCAAGATATCCGCGTGGGCAGGTAT 
CATCGCGGTCCCCACCATGATCGCGGGCATCTATGGCATGAACTTTCACTTCATGCCCGAGCTG 
10 GACTCCAGGTGGG6TTACCCGACAGTGATCGGCGGGATGGTCCTTATCTGTCTGTTCCTCTACC 
ACGTCTTCCGCAACAGAAACTGGCTCTAG 

>Rv1279 -TB.seq 1430060:1431643 MW:57332 
>emb|AL123456|MTBH37RV:143006CM431646. Rv1279 SEQ ID NO:48 

15 ATGGACACTCAGAGCGACTACGTCGTGGTCGGTACCGGCTCAGCCGGGGCGGTTGTGGCCAG 
GCGGCTTAGCACCGATCCGGCCACGACGGTGGTGGCCCTGGAGGCGGGGCCGCGTGACAAGA 
ACAGATTCATCGGCGTCCCAGCGGCGTTTTCCAAGCTGTTCCGCAGCGAGATCGACTGGGATTA 
CCTAACCGAACCGCAGCCGGAGCTCGACGGCCGCGAAATCTATTGGCCTCGTGGCAAGGTGCT 
CGGTGGCTCGTCGTCCATGAACGCAATGATGTGGGTGCGTGGATTCGCATCAGACTACGATGA 

20 GTGGGCCGCGCGAGCCGGTCCGCGGTGGTCGTACGCCGACGTGCTCGGCTACTTTCGCCGCA 
TCGAGAACGTCACCGCTGCCTGGCACTTTGTCAGCGGTGACGACAGCGGAGTAACCGGTCCGT 
TGCATATTTCCCGGCAACGCAGCCCAAGATCGGTGACCGCAGCGTGGCTGGCAGCCGCACGTG 
AGTGCGGATTTGCCGCTGCGCGGCCGMTTCCCCTCGACCGGAAGGCTTTTGCGAGACCGTCG 
TCACCCAGCGCCGCGGTGCTCGA7TCAGTACTGCCGACGCCTATCTGAAGCCCGCGATGCGCC 

25 GTAAAAACCTCCGTGTGCTTACCGGCGCCACTGCTACCCGGGTGGTCATCGACGGCGACCGGG 
CCGTCGGCGTGGAATACCAAAGCGACGGTCAAACCCGCATCGTCTACGCCCGCCGCGAGGTG 
GTGCTCTGCGCTGGTGCCGTCAACAGCCCTCAGCTGCTGATGCTCTCCGGCATCGGCGACCGC 
GACCACCTCGCCGAACAGGACATCGACACCGTTTACCAGGCGCCCGAGGTCGGGTGCAACCTG 
CTCGATCATCTCGTCACGGTGCTGGGTTTCGAGGTGGAAAAGGACAGCTTGTTTGCCGCCGAGA 

30 AGCCCGGCCAGTTGATCAGCTACTTACTGCGACGCGGCGGCATGCTCACCTCCAACGTCGGCG 
AGGCGTACGGATTTGTCCGGAGCCGACCCGMCTGAAGCTGCCCGATTTGGAGTTGATTTTTGC 
CCCGGCGCCGTTTTACGACGAAGCGCTGGTTGCACCGGCTGGTCACGGTGTGGTATTCGGCCC 
GATTCTGGTCGCGCCGCAAAGCCGTGGCCAGATCACGGTGCGGTCCGCCGATCCGCATGCCAA 
GCCTGTCATCGAACCGCGTTACCTGTCCGATCTCGGTGGCGTAGACCGGGCGGCCATGATGGC 

35 GGGGCTGCGGATATGCGCGCGGATCGCGCAGGCCCGCCCGGTCAGAGATCTCCTTGGGTCCA 
TCGCGCGACCGCGCAACAGCACCGAGCTGGACGAGGCCACTCTCGAGTTGGCGCTGGCCACT 
TGTTCGCACACCCTGTACCACCCGATGGGCACCTGCCGCATGGGCAGGGACGAGGCCAGCGT 



WO 01/35317 



PCT/US00/31152 



GGTGGATCCGCAGCTGCGGGTCCGCGGTGTCQACGGACTCCGCGTCGCCGACGCGTCGGTGA 
TGCCCAGCACGGTTCGTGGGCATACGCATGCGCCGTCGGTGCTGATCGGGGAGAAGGCCGCC 
GACTTAATCCGCAGCTGA 

5 >Rv1294 thrA homoserine dehydrogenase TB.seq 1449373:1450695 MW:45522 
>emb|AL123456|MTBH37RV:1449373-1450698, thrA SEQ ID NO:49 

GTGCCCG6TGACGAAAAGCCGGTCGGCGTAGCGGTACTCGGTTTGGGCAACGTCGGCAGCGA 
GGTTGTCCGCATCATCGAGAACAGCGCCGAGGATCTCGCGGCTCGTGTCGGTGCCCCATTGGT 
CCTGCGGGGCATCGGCGTGCGCCGCGTGACGACCGATCGCGGCGTGCCGATCGAATTGTTGA 

10 CCGACGACATTGAAGAGCTCGTGGCCCGCGAGGATGTCGATATCGTGGTGGAAGTGATGGGGC 
CGGTGGAACCGTCGCGCAAGGCGATCCTGGGCGCCCTTGAGCGCGGCAAGTCCGTCGTTACG 
GCGAACAAGGCTTTACTCGCCACCTCCACCGGCGAATTGGGACAGGCCGCCGAAAGGGCCCAT 
GTTGATCTGTATTTCGAGGCGGCCGTGGCGGGCGCCATTCCGGTCATCCGTCCGCTCACCCAG 
TCGCTGGCCGGCGACACGGTGCTGCGAGTGGCCGGGATCGTCAACGGCACCACCAACTACATC 

15 CTCTCGGCGATGGACAGCACCGGCGCTGACTATGCCAGCGCCCTGGCCGACGCAAGTGCGCT 
GGGCTATGCGGAGGCTGATCCCACCGCAGACGTCGAAGGCTACGACGCCGCGGCCAAGGCAG 
CGATCCTGGCATCCATTGCCTTCCACACCCGGGTGACCGCAGACGACGTGTATCGCGAAGGCA 
TCACCAAGGTCACTCCGGCCGACTTCGGATCCGCGCACGCGCTGGGTTGGACCATCAAACTGC 
TGTCGATCTGTGAGCGCATAACCACCGACGAAGGTTCGCAGCGGGTATCGGCCCGCGTCTATC 

20 CGGCCCTGGTACCTCTGTCGCATCCGCTTGCCGCGGTCAACGGCGCGTTCAATGCCGTGGTGG 
TCGAGGCCGAGGCCGCGGGCCGGCTGATGTTCTACGGCCAGGGCGCGGGCGGCGCGCCGAC 
CGCCTCTGCGGTGACCGGTGACCTAGTGATGGCCGCCCGCAACCGGGTACTCGGCAGCCGCG 
GCCCCCGTGAGTCTAMTACGCTCMCTTCCGGTGGCACCMTGGGTTTCATTGAAACGCGCTA 
TTACGTCAGCATGAACGTCGCCGACAAGCCGGGCGTCTTGTCCGCGGTGGCGGCGGAATTCGC 

25 CAAACGCGAGGTGAGCATCGCCGAGGTGCGCCAGGAGGGCGTTGTGGACGAAGGTGGTCGAC 
GGGTGGGAGCCCGAATCGTGGTGGTCACGCACCTCGCCACTGACGCCGCACTCTCGGAAACC 
GTTGATGCACTGGACGACTTGGATGTCGTGCAGGGTGTGTCCAGCGTGATACGACTGGAAGGA 
ACCGGCTTATGA 

30 >Rv1323 fadA4 acetyi-CoA C-acetyltransferase (aka thil) TB.seq 1485860:1487026 MW:40049 
>emb|AL123456|MTBH37RV:1485860-1487029 t fadA4 SEQIDNO:50 

GTGATTGTTGCTGGCGCGCGTACACCCATCGGCAAGTTGATGGGCTCCCTGAAGGATTTCAGCG 
CCAGCGAGCTGGGTGCCATCGCCATTAAGGGCGCCCTGGAGAAGGCCAACGTGCCGGCGTCC 
TTGGTCGAGTACGTGATCATGGGCCAGGTGTTGACCGCGGGTGCCGGGCAAATGCCCGCACG 
35 GCAGGCGGCAGTGGCGGCCGGCATCGGTTGGGATGTCCCTGCGCTGACGATCAACAAGATGT 
GCCTGTCCGGCATCGACGCAATCGCGCTGGCTGATCAACTCATTCGGGCCAGAGAGTTCGACG 
TGGTGGTGGCCGGCGGTCAGGAGTCGATGACGAAGGCGCCGCACCTGTTGATGAATAGCCGGT 
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CGGGTTACMGTACGGCGACGTTACGGTTTTGGACCACATGGCCTACGACGGTCTGCACGACG 
TGTTCACCGATCAGCCGATGGGCGCGCTCACCGAGCAACGCAACGACGTCGACATGTTCACCC 
GCTCCGAACAGGACGAGTACGCGGCTGCGTCCCACCAAAAGGCGGCCGCGGCATGGAAGGAC 
GGCGTATTCGCCGACGAGGTGATCCCGGTGAACATCCCGCAGCGCACGGGCGATCCACTGCA 

5 GTTCACCGAGGACGAGGGGATCCGCGCCAACACCACCGCCGCCGCGCTGGCCGGTCTGAAGC 
CGGCGTTCCGTGGCGACGGCACCATCACCGCCGGGTCGGCGTCACAGATCTCCGACGGTGCG 
GCCGGGGTGGTGGTCATGAACCAGGAAAAGGCCCAGGAACTGGGGCTGACCTGGCTAGCCGA 
GATCGGCGCCCACGGTGTGGTGGCCGGGCCGGATTCCACACTGCAATCGCAGCCGGCCAACG 
CGATCAACAAGGCGCTGGATCGCGAGGGCATCTCGGTGGACCAGCTCGACGTGGTGGAGATCA 

10 ACGAGGCGTTCGCTGCGGTGGCATTGGCCTCGATACGCGAACTCGGGCTGAACCCCCAGATCG 
TCAACGTCAACGGTGGTGCGATTGCCGTCGGGCATCCCCTCGGCATGTCAGGGACGCGAATCA 
CGCTACATGCGGCGCTGCAGTTGGCACGCCGGGGATCGGGCGTCGGGGTTGCCGCATTGTGC 
GGGGCTGGCGGGCAGGGCGACGCACTGATATTGCGGGCCGGATAG 

15 >Rv1 389 gmk putative guanylate kinase TB,seq 1 564399: 1 565022 MW:22064 
>emb|AL123456|MTBH37RV:1564399-1565025, gmk SEQ ID N0:51 

GTGAGCGTCGGCGAGGGACCGGACACCAAGCCCACCGCGCGTGGCCAACCGGCGGCAGTGG 

GACGTGTGGTGGTGCTGTCCGGTCCTTCCGCGGTCGGCAAATCCACGGTGGTTCGGTGTCTGC 

GCGAGCGGATCCCGAATCTGCATTTCAGTGTCTCGGCCACGACGCGGGCGCCACGCCCGGGC 

20 GAGGTCGACGGTGTCGACTACCACTTCATCGACCCCACCCGCTTTCAGCAGCTCATCGACCAG 
GGTGAGTTGCTGGAATGGGCAGAAATCCACGGCGGCCTGCACCGGTCGGGCACTTTGGCCCA 
GCCGGTGCGGGCGGCCGCGGCGACTGGTGTGCCGGTGCTTATCGAGGTTGACCTGGCCGGGG 
CCAGGGCGATCAAGAAGACGATGCCCGAGGCTGTCACCGTGTTTCTGGCGCCACCTAGCTGGC 
AGGATCTTCAGGCCAGACTGATTGGCCGCGGCACCGAAACAGCTGACGTTATCCAACGCCGCC 

25 TGGACACCGCGCGGATCGAATTGGCAGCGCAGGGCGACTTTGACAAGGTCGTGGTGAACAGGC 
GATTAGAGTCTGCGTGTGCGGAATTGGTATCCTTGCTGGTGGGAACGGCACCGGGCTCCCCGT 
GA 

>Rv1407 frnu similar to Fmu protein TB.seq 1 583099: 1 584469 MW:48494 

30 >emb|AL123456|MTBH37RV:1583099-1584472, fmu SEQ ID NO:52 

ATGACCCCTAGATCGCGTGGGCCGCGCCGCCGGCCGCTGGACCCGGCGCGTCGTGCGGCCTT 
CGAGACGCTGCGGGCGGTTAGTGCGCGCGACGCCTACGCGAACCTGGTGTTGCCCGCGCTGC 
TGGCCCAACGCGGTATCGGCGGTCGCGACGCCGCGTTCGCCACCGAGCTGACATACGGCACC 
TGCCGAGCCCGCGGCCTGCTCGACGCGGTCATCGGTGCGGCCGCCGAGCGTTCGCCGCAGGC 

35 GATCGATCCGGTGCTGCTAGACCTGTTGCGGCTCGGCACCTACCAATTGCTGCGCACGCGGGT 
CGACGCACACGCCGCAGTGTCGACCACCGTCGAGCAGGCCGGAATCGAATTCGATTCGGCGC 
GAGCAGGTTTCGTCAACGGTGTACTACGAACGATCGGCGGCCGAGACGAGCGGTCCTGGGTTG 
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GCGAACTCGCTCCTGATGCGCAGAACGATCCGATCGGGCATGCCGCGTTCGTGCATGCGCATC 
CCCGATGGATCGCCCAGGCCTTTGCTGACGCGTTGGGCGCGGCGGTCGGGGAGCTCGAGGCA 
GTTTTGGCCAGCGACGACGAACGGCCAGCGGTGCACCTGGCGGCACGCCCCGGGGTGCTGAC 
CGCCGGCGAACTGGCCCGCGCGGTGCGCGGAACCGTCGGTCGGTATTCGCCGTTTGCGGTGT 
5 ATCTGCCGCGCGGTGACCCGGGGCGACTGGCGCCGGTGCGCGACGGCCAAGCGCTGGTCCA 
GGACGAGGGCAGCCAGTTAGTCGCCCGAGCATTGACCCTGGCGCCAGTCGACGGCGATACCG 
GACGGTGGCTGGACCTGTGTGCCGGACCGGGCGGCAAGACCGCGCTGTTGGCCGGGCTGGGT 
TTGCAGTGCGCAGCCCGGGTGACCGCGGTGGAACCCTCGCCACACCGCGCGGACCTGGTAGC 
ACAGAACACCCGCGGGCTGCCGGTTGAGCTCTTGCGTGTCGACGGGCGGCACACCGACCTCG 

10 ACCCGGGTTTCGACCGGGTGCTGGTGGATGCGCCCTGCACCGGGCTGGGCGCGTTACGCCGT 
CGGCCGGAGGCCCGTTGGCGTCGTCAGCCGGCGGACGTAGCGGCACTGGCCAAGCTACAACG 
CGAGTTGTTGAGCGCCGCCATCGCGCTGACTCGGCCCGGCGGTGTCGTGCTCTATGCCACATG 
CTCGCCGCACCTGGCCGAGACTGTGGGTGCTGTCGCCGACGCGCTACGCCGACATCCGGTTCA 
CGCGCTCGATACCCGCCCACTGTTCGAGCCGGTGATCGCGGGGCTGGGGGAGGGGCCCCACG 

1 S TTCAGCTGTGGCCGCACCGGCACGGTACCGACGCCATGTTCGCCGCGGCGTTGCGCCGCCTG 
ACGTGA 

>Rv1409 ribG riboflavin biosynthesis TB.seq 1585192:1686208 MW:35367 
>emb|AL123456|MTBH37RV:1585192-1586211.ribG SEQ ID NO:53 

20 ATGAACGTGGAGCAGGTCAAGAGCATCGACGAGGCTATGGGTCTCGCCATCGAGCACTCCTAC 
CAGGTCAAAGGCACGACTTATCCAAAACCCCCAGTGGGGGCCGTCATTGTGGATCCCAACGGT 
CGGATCGTCGGCGCCGGCGGCACCGAGCCGGCCGGTGGCGATCATGCCGAGGTGGTGGCGC 
TGCGCCGGGCCGGCGGATTGGCTGCCGGCGCCATCGTGGTGGTCACCATGGAACCCTGTAAC 
CACTACGGCAAGACTCCGCCATGCGTGAACGCTCTGATCGAAGCCAGGGTGGGGACGGTGGTC 

25 TACGCCGTCGCCGACCCGAACGGGATCGCTGGGGGTGGGGCGGGCCGGCTGTCAGCAGCGG 
GCCTACAGGTGCGGTCCGGGGTGTTGGCTGAACAGGTGGCGGCCGGACCGCTGCGGGAGTGG 
CTCCACAAGCAACGCACCGGTCTGCCGCATGTCACCTGGAAGTACGCCACCAGCATCGACGGC 
CGCAGCGCCGCCGCCGACGGCTCCAGCCAGTGGATCTCCAGCGAGGCCGCACGCCTGGATCT 
GCATCGCCGCCGCGCCATCGCCGACGCGATCTTGGTCGGCACCGGCACCGTCCTCGCCGACG 

30 ACCCGGCCCTGACCGCGCGGCTGGCCGACGGCTCGCTGGCGCCGCAGCAGCCGCTGCGCGT 
GGTGGTGGGCAAGCGCGACATACCGCCGGAAGCACGGGTCCTCAACGACGAGGCACGCACCA 
TGATGATCCGCACCCACGAACCTATGGAGGTGCTCAGGGCGTTGTCGGATCGCACCGACGTGC 
TGCTGGAAGGAGGTCCCACCCTCGCCGGCGCCTTCCTACGAGCGGGTGCGATCAACCGGATCC 
TGGCCTACGTCGCACCGATCCTGTTGGGCGGTCCGGTTACCGCGGTCGATGACGTCGGGGTGT 

35 CCAACATCACCAACGCGTTGCGTTGGCAGTTCGACAGCGTCGAAAAGGTCGGACCGGATCTGTT 
GCTGAGCTTGGTGGCTCGTTAG 
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>Rv1440 secG TB.seq 1617715:1618065 MW:12140 
>emb|AL123456|MTBH37RV:1617715-1618068 t secG SEQ ID NO:54 

GTG6CAGGCGTGACAGCCGCGGTCAGTGCACGCCTCAAAQCCGATGAGGCGCGACGGCCTGG 
GTTCTACGCGGCAGGCAGCGGTCCGCTGCCGCAGGTTCGGGGGAGTACGCTACCCGTCATGG 
5 AATTGGCCGTGCAGATCACGCTGATCGTCACGAGCGTGCTGGTGGTGTTGTTAGTACTGCTGCA 
CCGGGCCAAGGGTGGCGGGCTATCGACACTGTTCGGCGGTGGTGTGCAGTCAAGCCTGTCCG 
GCTCGACGGTGGTGGAGAAGAACCTGGACCGGTTGACGCTGTTCGTTACCGGCATCTGGCTGG 
TGTCCATCATCGGCGTGGCGTTGCTCATCAAATACCGCTAG 

10 >Rv1484 inhA TB.seq 1674200:1675006 MW:28529 

>emb|AL123456|MTBH37RV:1674200-1675009, inhA SEQ ID NO:55 

ATGACAGGACTGCTGGACGGCAAACGGATTCTGGTTAGCGGAATCATCACCGACTCGTCGATCG 

CGTTTCACATCGCACGGGTAGCCCAGGAGCAGGGCGCCCAGCTGGTGCTCACCGGGTTCGAC 

CGGCTGCGGCTGATTCAGCGCATCACCGACCGGCTGCCGGCAAAGGCCCCGCTGCTCGAACT 

15 CGACGTGCAAAACGAGGAGCACCTGGCCAGCTTGGCCGGCCGGGTGACCGAGGCGATCGGGG 
CGGGCAACAAGCTCGACGGGGTGGTGCATTCGATTGGGTTCATGCCGCAGACCGGGATGGGC 
ATCAACCCGTTCTTCGACGCGCCCTACGCGGATGTGTCCAAGGGCATCCACATCTCGGCGTATT 
CGTATGCTTCGATGGCCAAGGCGCTGCTGCCGATCATGAACCCCGGAGGTTCCATCGTCGGCA 
TGGACTTCGACCCGAGCCGGGCGATGCCGGCCTACAACTGGATGACGGTCGCCAAGAGCGCG 

20 TTGGAGTCGGTCAACAGGTTCGTGGCGCGCGAGGCCGGCAAGTACGGTGTGCGTTCGAATCTC 
GTTGCCGCAGGCCCTATCCGGACGCTGGCGATGAGTGCGATCGTCGGCGGTGCGCTCGGCGA 
GGAGGCCGGCGCCCAGATCCAGCTGCTCGAGGAGGGCTGGGATCAGCGCGCTCCGATCGGCT 
GGAACATGAAGGATGCGACGCCGGTCGCCAAGACGGTGTGCGCGCTGCTGTCTGACTGGCTG 
CCGGCGACCACGGGTGACATCATCTACGCCGACGGCGGCGCGCACACCCAATTGCTCTAG 

25 

>Rv1617 pykA pyruvate kinase TB.seq 1816187:1817602 MW:50668 
>emb|AL123456|MTBH37RV:1816187~1817605 t pykA SEQ ID NO:56 

GTGACGAGACGCGGGAAAATCGTCTGCACTCTCGGGCGGGCCACCCAGCGGGACGACCTGGT 
CAGAGCGCTGGTCGAGGCCGGAATGGACGTCGCCCGAATGAACTTCAGCCACGGCGACTACGA 

30 CGATCACAAGGTCGCCTATGAGCGGGTCCGGGTAGCCTCCGACGCCACCGGGCGCGCGGTCG 
GCGTGCTCGCCGACCTGCAGGGCCCGAAGATCAGGTTGGGACGCTTCGCCTCCGGGGCCACC 
CACTGGGCCGAAGGCGAAACCGTCCGGATCACCGTGGGCGCCTGCGAGGGCAGCCACGATCG 
GGTGTCCACCACCTACAAGCGGCTAGCCCAGGACGCGGTGGCCGGTGACCGGGTGCTGGTCG 
ACGACGGCAAAGTCGCATTGGTGGTCGACGCCGTCGAGGGCGACGACGTGGTCTGCACCGTC 

35 GTCGAAGGCGGCCCGGTCAGCGACAACAAGGGCATCTCGTTGCCCGGAATGAACGTGACCGC 
GCCGGCCCTGTCGGAGAAGGACATCGAGGATCTCACGTTCGCGCTGAACCTCGGCGTCGACAT 
GGTGGCGCTTTCCTTCGTCCGCTCCCCGGCCGATGTCGAACTGGTCCACGAGGTGATGGATCG 
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6ATCG6GCGACGGGTGCCGGTGATCGCCAAGCTG6AGAAGCCGGAAGCCATGGACAATCTCG 
AAGCGATCGTGCTGGCGTTCGACGCCGTCATGGTCGCTCGGGGCGACCTAGGTGTTGAGCTGC 
CGCTCGAAGAGGTCCCGCTGGTACAGAAGCGAGCCATCCAGATGGCCCGGGAGAACGCCAAG 
CCGGTCATTGTGGCGACCCAGATGCTCGACTCGATGATCGAGAACTCGCGGCCGACCCGAGCT 
5 GAGGCCTCCGACGTCGCCAACGCGGTGCTCGATGGCGCCGACGCGCTGATGCTGTCCGGGGA 
AACCTCGGTAGGGAAGTACCCGCTTGCTGCGGTCCGGACAATGTCGCGCATCATCTGCGCGGT 
CGAGGAGAACTCCACGGCCGCACCGCCGTTGACACACATTCCCCGGACCAAGCGTGGGGTCAT 
CTCGTATGCGGCCCGTGACATCGGCGAACGACTCGACGCCAAGGCCTTGGTGGCCTTCACTCA 
GTCCGGTGATACCGTGCGGCGACTGGGCCGCCT6CATACCCCGCTGCCGCTGCTGGCCTTCAC 
10 CGCGTGGCCCGAGGTGCGCAGCCAACTGGCGATGACCTGGGGCACCGAGACGTTCATCGTGC 
CGAAGATGCAGTCCACCGATGGCATGATCCGCCAGGTCGACAAATCGCTGCTCGAACTCGCCC 
GCTACAAGCGTGGTGACTTGGTGGTCATCGTCGCGGGTGCGCCGCCAGGCACAGTGGGTTCGA 
CCAACCTGATCCACGTGCACCGGATCGGGGAAGATGACGTCTAG 

15 >Rv1630 rpsA 30S ribosomal protein S1 TB.seq 1833540:1834982 MW:53203 
>emb|AL123456|MTBH37RV:183354tM834985, rpsA SEQ ID NO:57 

ATGCCGAGTCCCACCGTCACCTCGCCGCAAGTAGCCGTCAACGACATAGGCTCTAGCGAGGAC 
TTTCTCGCCGCAATAGACAAAACGATCAAGTACTTCAACGATGGCGACATCGTCGAAGGCACCA 
TCGTCAAAGTGGACCGGGACGAGGTGCTCCTCGACATCGGCTACAAGACCGAAGGCGTGATCC 

20 CCGCCCGCGAACTGTCCATCAAGCACGACGTCGACCCCAACGAGGTGGTTTCCGTCGGTGACG 
AGGTCGAAGCCCTGGTGCTCACCAAGGAGGACAAAGAGGGCCGGCTCATCCTCTCCAAGAAAC 
GCGCGCAGTACGAGCGTGCCTGGGGCACGATCGAGGCGCTCAAGGAGAAGGACGAGGCCGTC 
AAGGGCACGGTCATCGAGGTCGTCAAGGGTGGCCTGATCCTCGACATCGGGCTGCGCGGTTTC 
CTGCCCGCCTCGCTGGTGGAGATGCGCCGGGTGCGCGACCTGCAGCCCTACATCGGCAAGGA 

25 GATCGAGGCCAAGATCATCGAGCTGGACAAGAACCGCAACAACGTGGTGCTGTCCCGTCGCGC 
CTGGCTGGAGCAGACCCAGTCCGAGGTGCGCAGCGAGTTCCTGAATAACTTGCAAAAAGGCAC 
CATCGGAAAGGGTGTCGTGTCCTCGATCGTCAACTTCGGCGCGTTCGTCGATCTCGGCGGTGT 
GGACGGTCTGGTGCATGTCTCCGAGCTATCGTGGAAGCACATCGACCACCCGTCCGAGGTGGT 
CCAGGTTGGTGACGAGGTCACCGTCGAGGTGCTCGACGTCGACATGGACCGTGAGCGGGTTTC 

30 GTTGTCACTCAAGGCGACTCAGGAAGACCCGTGGCGGCACTTCGCCCGCACTCACGCGATCGG 
GCAGATCGTGCCGGGCAAGGTCACCAAGTTGGTTCCGTTCGGTGCATTCGTCCGCGTCGAGGA 
GGGTATCGAGGGCCTGGTGCACATCTCCGAGCTGGCCGAGCGTCACGTCGAGGTGCCCGATC 
AGGTGGTTGCCGTCGGCGACGACGCGATGGTCAAGGTCATCGACATCGACCTGGAGCGCCGTC 
GGATCTCGTTGTCGCTCAAGCAAGCCAATGAGGACTACACCGAGGAGTTCGACCCGGCGAAGT 

35 ACGGCATGGCCGACAGTTACGACGAGCAGGGCAACTACATCTTCCCGGAGGGCTTCGATGCCG 
AAACCAACGAATGGCTTGAGGGATTCGAAAAGCAGCGCGCCGAATGGGAAGCTCGGTACGCCG 
AGGCCGAGCGCCGGCACAAGATGCACACCGCGCAGATGGAGAAGTTCGCCGCCGCCGAGGCG 
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GCTGQAC6CGGCQCGGACGATCAGTCGTCGGCCAGTAQCGCACCGTCGGAAAAGACCGCGGG 

TGGATCACTGGCCAGCGACGCCCAGCTGGCGGCCCTGCGGGAAAAACTCGCCGGCAGCGCTT 

GA 

5 >Rv1631 -TB<seq 1835011:1836231 MW:44669 

>emb|AL123456|MTBH37RV:183501 1-1836234, Rv1631 SEQ ID NO:58 

ATGCTGCGCATCGGGCTGACCGGCGGCATTGGCGCCGGGAAGTCGTTGCTGTCCACGACGTTC 
TCGCAATGCGGCGGAATCGTTGTCGACGGCGATGTGTTGGCGCGTGAAGTGGTCCAGCCGGGC 
ACCGAGGGGCTGGCCTCGCTGGTCGACGCGTTCGGTCGCGACATCCTGCTTGCAGACGGAGC 

10 GCTGGACCGGCAGGCGTTGGCGGCCAAGGCGTTTCGAGATGACGAGTCGCGCGGTGTGCTCA 
ACGGAATCGTGCACCCGCTGGTCGCCCGGCGCCGATCCGAGATCATCGCGGCGGTTTCGGGG 
GACGCGGTTGTGGTCGAAGATATTCCAGTGCTGGTGGMTCCGGGATGGCGCCATTGTTTCCGC 
TGGTGGTGGTGGTGCACGCCGACGTCGAGCTACGGGTGCGACGGCTGGTCGAGCAACGCGGC 
ATGGCCGAAGCCGACGCCCGGGCTAGGATCGCTGCGCAGGCCAGCGACCAGCAGCGTCGTGC 

15 CGTCGCCGACGTCTGGCTGGACAACTCGGGCAGCCCAGAGGATTTGGTGCGGCGGGCCCGCG 
ACGTCTGGAACACGCGCGTCCAGCCCTTCGCGCACAACCTGGCCCAACGTCAGATTGCGCGCG 
CGCCGGCTAGGTTGGTGCCGGCGGATCCAAGCTGGCCGGATCAGGCGCGGCGCATCGTCAAC 
CGGCTAAAGATCGCGTGCGGGCATAAGGCCTTGCGAGTTGACCACATTGGGTCAACCGCCGTG 
TCGGGCTTCCCCGATTTTCTAGCCAAGGATGTCATCGACATCCAGGTCACCGTCGAATCACTTG 

20 ACGTGGCCGACGAGCTGGCCGAGCCCTTGCTGGCCGCCGGCTAGCCACGCCTCGAGCACATC 
ACCCAGGACACCGAAAAGACCGACGCTCGCAGCACCGTCGGCCGCTACGACCACACCGACAGT 
GCCGCTCTGTGGCACAAGCGCGTGCACGCCTCGGCGGATCCCGGTCGGCCGACCAACGTGCA 
CCTGCGGGTGCACGGCTGGCCCAACCAACAGTTCGCCCTGCTGTTCGTCGACTGGCTGGCGGC 
CAATCCCGGCGCGAGAGAAGACTATTTGACGGTCAAGTGTGACGCCGACAGGCGCGCCGACG 

25 GTGAGCTCGCGCGCTACGTCACCGCCAAGGAGCCGTGGTTCCTGGATGCCTACCAGCGGGCAT 
GGGAGTGGGCGGATGCGGTGCACTGGCGTCCCTGA 

>Rv1706c - TB.seq 1932695:1933876 MW:39779 
>emb|AL123456|MTBH37RV:c1933876-1932692, PPE SEQ ID NO:59 

30 ATGACCCTCGATGTCCCGGTCAACCAGGGGCATGTCCCCCCGGGCAGCGTCGCCTGCTGCCTT 
GTTGGGGTCACGGCCGTTGCTGACGGCATCGCCGGGCATTCCCTGTCCAACTTTGG6GCGTTA 
CCTCCCGAGATCAATTCGGGTCGTATGTATAGCGGTCCGGGATCCGGGCCACTGATGGCTGCC 
GCGGCGGCCTGGGACGGGCTGGCCGCAGAGTTGTCGTCGGCAGCGACTGGCTACGGTGCGG 
CGATCTCGGAGCTGACAAACATGCGGTGGTGGTCGGGGCCGGCATCGGATTCGATGGTGGCC 

35 GCCGTCCTGCCCTTTGTCGGCTGGCTGAGTACCACCGCGACGCTAGCCGAACAGGCCGCGATG 
CAGGCTAGGGCGGCCGCAGCGGCCTTTGAAGCCGCCTTCGCCATGACGGTGCCCCCGCCGGC 
GATCGCGGCCAACCGGACCTTGTTGATGACGCTCGTCGATACCAACTGGTTCGGGCAAAACAC 
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GCCGGCGATCGCCACCACCGAGTCCCAATACGCCGAGATGTGGGCCCAAGACGCCGCCGCGA 
TGTACGGCTATGCCAGCGCCGCGGCACCCGCCACGGTTTTGACTCCGTTCGCACCACCGCCGC 
AAACCACCAACGCGACCGGCCTCGTCGGCCACGCAACAGCGGTGGCCGCGCTGCGGGGGCAG 
CACAGCTGGGCCGCGGCGATTCCATGGAGCGACATACAGAAATACTGGATGATGTTCCTGGGC 

5 GCCCTCGCCACTGCCGAAGGGTTCATTTACGACAGCGGTGGGTTAACGCTGAATGCTCTGCAGT 
TCGTCGGCGGGATGTTGTGGAGCACCGCATTGGCAGAAGCCGGTGCGGCCGAGGCAGCGGCC 
GGCGGGGGTGGAGCCGCTGGATGGTCGGCGTGGTCGCAGCTGGGAGCTGGACCGGTGGCGG 
CGAGCGCGACTCTGGCCGCCAAGATCGGACCGATGTCGGTGGCGCCGGGCTGGTCCGCACCG 
CCCGCCACGCCCCAGGCGCAAACCGTCGCGCGATCGATTCCCGGTATTCGCAGCGCCGCCGA 

10 GGCGGCTGAAACATCGGTCCTACTCCGGGGGGCACCGACTCCGGGCAGGAGTCGCGCCGCCC 
ATATGGGACGCCGATATGGAAGACGACTCACCGTGATGGCTGACCGGCCGAACGTCGGATAG 

>Rv1745c - similar to Q46822 ORF.0182 TB.seq 1971381 :1971989 MW:22490 
>emb|ALl23456|MTBH37RV:c1971989-1971378, Rv1745c SEQ ID NO:60 

15 ATGACCCGCAGCTACCGGCCAGCTCCACCGATCGAGCGGGTGGTTTTGCTCAACGACCGCGGC 
GACGCGACAGGTGTGGCCGACAAGGCCACCGTGCACACCGGCGACACCCCTTTGCACCTCGC 
GTTCTCCAGCTATGTGTTCGATCTGCACGATCAGCTGTTGATCACGCGGCGGGCCGCCACCAAG 
AGGACGTGGCCGGCGGTATGGACCAACAGTTGCTGCGGGCACCCCCTGCCTGGCGMTCGCT 
ACCCGGCGCCATACGCCGGCGGCTCGCTGCCGAACTCGGACTGACCCCAGATCGGGTCGATC 

20 TGATCCTGCCGGGGTTCCGCTACCGGGCCGCTATGGCCGATGGCACCGTGGAAAACGAGATCT 
GCCCCGTCTACCGAGTCCAGGTTGACCAACAGCCCCGGCCGAACTCGGACGAGGTCGACGCG 
ATCCGCTGGTTGTCCTGGGAACAATTCGTGCGCGATGTTACCGCCGGCGTAATCGCCCCGGTAT 
CCCCTTGGTGCCGCTCACAACTGGGCTACCTGACCAAACTTGGACCATGTCCGGCACAGTGGG 
CCGTGGCCGACGACTGCCGGCTACCGAAAGCCGCACATGGTAATTAA 

25 

>Rv1800 -TB.seq 2039451:2041415 MW.67068 
>emb|AL123456|MTBH37RV:2039451-2041418, PPE SEQ ID NO:61 

ATGCTGCCGAATTTCGCGGTGCTGCCCCCCGAGGTCAATTCGGCGAGGGTGTTCGCCGGTGCG 
GGGTCGGCGCCGATGTTAGCGGCAGCGGCCGCCTGGGATGATCTAGCCTCCGAGCTGCATTGT 

30 GCTGCAATGTCATTCGGGTCGGTTACGTCGGGATTGGTGGTTGGGTGGTGGCAGGGATCGGCG 
TCGGCGGCGATGGTGGACGCAGCCGCGTCGTACATCGGGTGGCTGAGCACGTCGGCTGCCCA 
CGCCGAGGGCGCGGCCGGTCTGGCTCGGGCCGCGGTATCGGTGTTCGAGGAGGCGCTGGCC 
GCGACGGTGCATCCGGCGATGGTTGCGGCAAATCGCGCCCAGGTGGCGTCGCTGGTAGCGTC 
GAACTTGTTTGGGCAGAACGCGCCTGCGATCGCCGCGCTCGAATCCTTGTATGAGTGTATGTGG 

35 GCCCAGGATGCAGCGGCCATGGCGGGTTATTACGTTGGGGCTTCGGCGGTGGCCACACAGTTG 
GCATCGTGGCTGCAACGGCTACAGAGCATCCCCGGCGCCGCCAGTCTTGATGCCCGTCTGCCG 
AGCTCGGCCGAGGCACCGATGGGAGTCGTCCGCGCGGTCAACAGCGCGATCGCCGCCAATGC 
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6GCTGCGGCACAMCCQTTGGCCT6GTCATGGGAGGCAGCGGCACGCCAATACCGTCGGCCA 
GATATGTCGAGCTCGCGAACGCGCTGTACATGAGTGGCAGCGTCCCGGGTGTTATCGCGCAGG 
CGCTCTTCACGCCCCAAGGGCTCTACCCGGTGGTCGTGATCAAGAACCTCACTTTCGATTCCTC 
GGTGGCGCAGGGTGCCGTCATTCTCGAAAGTGCGATTCGGCAGCAAATTGCCGCCGGCAACAA 

5 CGTCACCGTCTTCGGCTACTCGCAGAGCGCCACGATCTCGTCACTAGTGATGGCCAATCTTGCG 
GCTTCGGCCGACCCGCCGTCTCCAGACGAGCTTTCCTTCACGCTGATCGGCAATCCCAACAACC 
CCAATGGCGGGGTTGCCACCAGGTTCCCGGGGATCTCCTTTCCAAGCTTGGGCGTGACGGCCA 
CCGGGGCCACTCCGCACAATCTGTACCCGACCAAGATCTACACCATCGAATACGACGGCGTCG 
CCGACTTTCCGCGGTACCCGCTCAACTTTGTGTCGACCCTCAACGCCATTGCCGGCACCTACTA 

10 GGTGCACTCCAACTACTTCATCCTGACGCCGGAACAAATTGACGCAGCGGTTCCGCTGACCAAT 
ACGGTCGGTCCCACGATGACCCAGTACTACATCATTCGCACGGAGAACCTGCCGCTGCTAGAG 
CCACTGCGATCGGTGCCGATCGTGGGGAACCCACTGGCGAACCTGGTTCAACCAAACTTGAAG 
GTGATTGTTAACCTGGGCTACGGCGACCCGGCCTATGGTTATTCGACCTCGCCGCCCAATGTTG 
CGACTCCGTTCGGGTTGTTCCCAGAGGTCAGCCCGGTCGTCATCGCCGACGCTCTCGTCGCCG 

15 GGACCCAGCAGGGAATCGGCGATTTCGCCTACGACGTCAGCCACCTCGAACTGCCGTTGCCGG 
GAGACGGGTCGACGATGCCAAGCACC6CACCGGGCTCGGGTACGCCGGTCCCCCCGCTCTCG 
ATCGACAGCCTGATAGACGACCTGCAGGTGGCTAACCGCAACCTCGCCAACACGATTTCGAAG 
GTGGCCGCGACGAGCTACGCGACGGTGCTCCCAACCGCCGACATCGCCAATGCGGCGTTGAC 
GATCGTGCCGTCGTACMCATCCACCTTTTTTTGGAGGGCATCCAGCAAGCGCTCAAGGGCGAC 

20 GCGATGGGACTCGTCAACGCGGTCGGATACCCACTCGCGGCCGACGTGGCACTGTTCACGGCC 
GCAGGCGGTCTTCAGCTCTTGATCATCATCAGCGCGGGCCGAACGATTGCCMTGACATCTCGG 
CCATTGTCCCCTGA 

>Rv 1 844c gnd 6-phosphogluconata dehydrogenase (Gram -) TB.seq 2093732:20951 86 
25 MW:51548 >emb|AL123456|MTBH37RV:c2095186-2093729. gnd SEQ ID NO:62 

ATGAGTTCGTCGGAATCGCCAGCCGGCATCGCGCAGATCGGCGTCACTGGCCTGGCCGTGATG 
GGTTCCAACATCGCCCGAAACTTCGCCCGGCACGGCTACACCGTGGCAGTGCACAATCGGTCG 
GTCGCCAAGACCGACGCGCTGCTTAAGGAGCACAGCTCAGACGGCAAGTTCGTGCGCAGTGAA 
ACGATCCCCGAATTTCTTGCCGCACTGGAAAAACCGCGTCGGGTGCTGATCATGGTCAAGGCC 
30 GGAGAGGCCACTGACGCTGACGCTGTCATCAACGAACTTGCTGACGCCATGGAACCCGGCGAC 
ATCATCATCGACGGCGGCAATGCGTTGTACACCGACACCATGCGCCGCGAGAAAGCGATGCGT 
GAGCGGGGCTTGCACTTCGTCGGGGCCGGGATCTCCGGCGGCGAAGAGGGCGCGTTGAACGG 
GCCGTCGATCATGCCCGGCGGACCCGCCGAGTCATACCAATCGCTGGGTCCGCTGCTCGAGGA 
GATCTCCGCGCATGTCGACGGCGTGCCGTGCTGCACCCACATTGGCCCGGACGGCTCCGGGC 
35 ACTTCGTCAAGATGGTCCACAACGGCATCGAGTACTCCGACATGCAGCTCATCGGTGAGGCCTA 
CCAGCTGATGCGCGACGGGCTAGGTCTGACCGCGCCGGCGATCGCCGATGTGTTCACCGAGT 
GGAACAATGGCGATCTGGACAGCTACCTGGTCGAGATCACCGCCGAGGTGCTGCGGCAGAGCG 
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ATGCCAAGACCGGCAAACCGCTCGTCGACGTCATC6TGGACCGGGCCGAGCAGAAAGGCACC 
GGCCGTTGGACCGTCAAGTCCGCGCTGGACCTGGGTGTGCCGGTGACCGGCATCGCCGAAGC 
GGTGTTTGCCCGCGCTCTCTCGGGATCCGTGGGGCAACGCTCGGCCGCCAGCGGTCTGGCTTC 
GGGCAAGCTCGGCGAGCAGCCCGCCGACCCCGCCACGTTCACCGAAGACGTCCGCCAGGCGT 

5 TGTACGCCTCCAAGATCGTGGCCTACGCTCAGGGCTTCAACCAGATCCAGGCCGGCAGCGCCG 
AATTCGGCTGGGACATCACGCCGGGCGACCTGGCCACCATCTGGCGTGGCGGCTGCATCATCC 
GGGCGAAGTTCCTCAACCACATCAAGGAAGCCTTTGACGCCAGCCCGAACCTGGCCAGTCTGA 
TTGTGGCCCCGTATTTCCGCGGCGCCGTCGAATCGGCGATCGACAGTTGGCGGCGTGTGGTGT 
CGACGGCGGCCCAACTGGGTATCCCGACCCCGGGATTCTCGTCGGCCCTGTCGTATTACGACG 

10 CGCTGCGCACCGCGCGGCTGCCCGGTGCACTCACCCAGGCCCAGCGCGACTTCTTCGGCGCA 
CACACCTACGGCCGGATCGACGAACCAGGCAAGTTCCACACACTATGGAGTTCAGACCGCACC 
GAAGTACCGGTGTAG 

>Rv1900c lipJ TB,seq 2146246:2147631 MW.49685 

15 >emb|AL123456|MTBH37RV:c2147631-2146243, lipJ SEQ ID NO:63 

GTGGCGCAGGCTCCCCACATTCACAGGACCCGCTACGCAAAATGCGGCGACATGGATATCGGC 
TACCAGGTGCTGGGTGACGGTCCGACGGATCTGCTGGTGTTGCCGGGGCCGTTCGTGCCGATC 
GACTCGATCGACGACGAGCCATCGCTGTACCGTTTCCATCGCCGTCTTGCGTCATTCAGCAGGG 
TGATCCGCCTCGACCATCGTGGGGTCGGCCTGTCGTCACGGCTCGCCGCGATAACCACGCTGG 

20 GGCCGAAGTTCTGGGCCCAGGACGCGATCGCGGTGATGGACGCGGTCGGATGCGAGCAGGCG 
ACMTmCGCGCCCAGTTTCCACGCCATGMCGGACTTGTTCTCGCCGCCGACTACCCCGAGC 
GGGTGCGCAGCCTGATCGTCGTCAACGGCTCGGCGCGCCCACTATGGGCGCCCGACTACCCG 
GTAGGCGCCCAGGTTCGTCGAGCTGACCCGTTCCTGACGGTGGCGCTGGAACCGGATGCCGTC 
GAGCGGGGCTTCGACGTGCTGAGCATCGTGGCTCCTACCGTGGCCGGAGATGACGTGTTTCGA 

25 GCCTGGTGGGATCTCGCCGGCAACCGTGCCGGACCGCCGAGCATTGCCCGTGCCGTTTCAAAG 
GTCATAGCCGAGGCCGACGTACGAGATGTCTTGGGACACATCGAGGCTCCAACACTGATCTTGC 
ACCGTGTCGGATCGACGTACATCCCGGTGGGACATGGTCGCTACCTCGCCGAGCACATCGCTG 
GATCCCGCTTGGTCGAACTACCCGGCACCGATACCCTGTACTGGGTTGGCGACACCGGGCCGA 
TGCTCGATGAAATCGAGGAATTCATCACCGGCGTGCGCGGCGGCGCTGACGCCGAGCGCATGC 

30 TTGCCACCATCATGTTTACCGACATCGTCGGCTCGACCCAGCACGCCGCCGCGCTCGGCGACG 
ACCGATGGCGCGACCTGTTGGACAACCACGACACCATCGTGTGCCACGAAATCCAGCGGTTCG 
GCGGTCGCGAAGTGAACACGGCCGGTGACGGTTTCGTCGCGACGTTCACCAGTCCGAGTGCC 
GCGATCGCGTGCGCGGACGACATCGTCGACGCGGTCGCCGCGCTGGGTATTGAGGTCCGGAT 
CGGTATTCATGCGGGCGAGGTCGAGGTGCGCGATGCCTCGCACGGTACCGACGTCGCCGGCG 

35 TGGCCGTGCATATCGGTGCGCGCGTCTGCGCGCTGGCCGGACCCAGTGAGGTGCTGGTGTCC 
TCGACCGTGCGAGACATCGTCGCCGGATCACGGCACCGGTTCGCCGAGCGTGGTGAGCAGGA 
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ACTCAAGGGCGTACCGGGCAGAT6GCGGCTATGCGTGCTCATGCGCGACGACGCCACCCGCA 
CGCGCTAA 

>Rv1967 - TB.seq 2210599:2211624 MW:36516 

5 >emb|AL123456|MTBH37RV:2210599-2211627 f Rv1967 SEQIDNO:64 

ATGAGGGAGAACCTGGGGGGCGTCGTGGTGCGCCTCGGCGTCTTCCTGGCGGTATGCGTGCT 
GACGGCGTTCCTGCTGATTGCCGTCTTCGGGGAGGTGCGCTTCGGCGACGGCAAGACCTACTA 
CGCCGAGTTCGCCAACGTGTCCAATCTGCGAACGGGCAAGCTGGTGCGCATCGCCGGCGTCGA 
GGTCGGCAAGGTCACCAGGATCTCCATCAACCCCGACGCGACGGTGCGGGTGCAGTTCACCGC 

10 CGACAACTCGGTCACCCTCACGCGGGGCACCCGGGCGGTGATCCGCTACGACAACCTGTTCGG 
TGACCGCTATTTGGCGCTGGAGGAAGGGGCCGGCGGACTCGCCGTTCTTCGTCCCGGTCACAC 
GATTCCGTTGGCGCGCACCCAACCGGCGTTGGATCTGGATGCCCTGATCGGTGGATTCAAGCC 
GCTGTTTCGTGCGCTGAACCCCGAGCAGGTCAACGCGCTGAGCGAACAGTTGCTGCACGCGTT 
TGCCGGACAGGGGCCCACGATCGGGTCATTGCTGGCCCAGTCCGCGGCCGTGACCAACACCC 

15 TGGCCGACCGTGATCGGCTGATCGGGCAGGTGATCACCAACCTCAACGTGGTGCTGGGCTCGC 
TGGGCGCTCACACCGATCGGTTGGACCAGGCGGTGACGTCGCTATCAGCGTTGATTCACCGGC 
TCGCGCAACGCAAGACCGACATCTGCAACGCCGTGGCCTACACCAACGCCGCCGCCGGCTCG 
GTCGCCGATCTGCTGTCGCAGGCTCGCGCGGCGTTGGCGAAGGTGGTTCGCGAGACCGATCG 
GGTGGCCGGCATCGCGGCCGCCGACCACGACTACCTCGACAATCTGCTCAACACGCTGCCGGA 

20 CAAATACCAGGCGCTGGTCCGCCAGGGTATGTACGGCGACTTCTTCGCCTTCTACCTGTGCGAC 
GTCGTGCTCAAGGTCAACGGCAAGGGCGGCCAGCCGGTGTACATCAAGCTGGCCGGTCAGGA 
CAGCGGGCGGTGCGCGCCGAAATGA 

>Rv1975 - TB.seq 2218050:2218712 MW:23650 
25 >emb|ALl23456|MTBH37RV:2218050-2218715. Rv1975 SEQ ID NO:65 

ATGTCGCGTCGAGCATCGGCCACGTGTGCCTTGTCCGCGACCACCGCCGTCGCCATAATGGCT 
GCTCCCGCCGCACGGGCCGACGACAAGCGGCTCAACGACGGCGTGGTCGCCAACGTCTACAC 
CGTTCAACGTCAGGCCGGCTGCACCAACGACGTCACGATCAACCCGCAACTACAATTGGCCGC 
CCAATGGCACACCCTCGATCTGCTGAAGAACCGGCACCTCAACGACGACACCGGTTCTGACGG 
30 ATCCACACCGCAAGACCGCGCGCATGCCGCCGGCTTCCGCGGGAAAGTCGCTGAAACCGTGG 
CGATCAATCCCGCCGTAGCGATCAGCGGCATCGAGTTGATAAACCAGTGGTACTACAACCCCGC 
GTTTTTCGCGATCATGTCCGACTGCGCCAACACCCAGATCGGGGTGTGGTCAGAAAACAGCCC 
GGATCGCACCGTCGTGGTGGCGGTTTACGGACAGCCCGATCGACCTTCCGCGATGCCGCCCAG 
GGGAGCGGTAACGGGACCGCCGTCCCCGGTGGCCGCGCAAGAGAACGTTCCTATCGACCCCA 
35 GCCCCGACTACGACGCCAGCGACGAGATCGAATACGGCATCAACTGGCTGCCATGGATCCTGC 
GCGGCGTGTACCCGCCGCCCGCAATGCCGCCGCAGTAG 
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>Rv1981c nrdF ribonucleotide reductase small subunit TB.seq 2224221:2225186 MW:36591 
>emb|AL123456|MTBH37RV:c2225186-2224218 f nrdF SEQ ID NO:66 

ATGACCGGCAAGGTCGTTGAGCGGGTGCACGCAATCAATTGGAACCGGTTGCTCGATGCTAAA 
GATTTGCAGGTCTGGGMCGTTTGACCGGTMCTTRGGTTGCCGGAAAAGATTCCGCTCTCCA 
5 ACGACCTGGCATCTTGGCAAAGGTTGAGTTCCACCGAGCAGCAGACGACGATCCGGGTGTTCA 
CCGGCTTGACCCTGCTCGACACCGCGCAGGCGACGGTGGGAGCAGTGGCCATGATCGACGAC 
GCGGTCACCCCCCACGAAGAGGCGGTCCTGACCAACATGGCGTTCATGGAGTCAGTGCACGCC 
AAGAGCTACAGCTCGATCTTCTCGACCCTGTGCTCGACCAAGCAGATCGACGATGCCTTCGACT 
GGTCGGAACAGAACCCTTACCTGCAGCGAAAAGCGCAGATCATCGTGGACTACTACCGCGGTG 

1 0 ACGACGCGCTCAAGCGCAAAGCATCGTCGGTAATGCTGGAGTCCTTCCTGTTCTACTCCGGCTT 
CTACCTGCCCATGTACTGGTCGTCGCGGGGTAAGCTCACCAACACCGCCGATCTGATCCGGCT 
GATCATCCGAGATGAAGCCGTCCACGGCTACTACATCGGCTACAAATGTCAACGAGGTTTGGCC 
GACCTGACCGACGCCGAGCGGGCCGACCACCGCGAATACACCTGCGAGCTGCTGCACACGCT 
CTACGCGAACGAGATCGACTATGCGCACGACTTGTACGACGAGTTGGGCTGGACCGACGACGT 

1 5 TTTGCCCTACATGCGTTACAACGCCAACAAGGCGCTAGCCAACCTGGGATACCAGCCTGC ATTC 
GATCGTGACACCTGCCAGGTGAACCCGGCCGTGCGCGCAGCTCTCGACCCCGGTGCAGGGGA 
GAACCACGACTTTTTCTCCGGCTCCGGAAGCTCATACGTAATGGGCACCCACCAACCCACCACC 
GACACCGACTGGGACTTCTAA 

20 >Rv2092c helY helicase, Ski2 subfamily TB.seq 2349335:2352052 MW:99576 
>emb|AL123456|MTBH37RV:c2352052-2349332, helY SEQ ID NO:67 

GTGACTGAGCTGGCCGAGCTGGACCGGTTCACGGCGGAACTAGCGTTCTCGCTCGACGACTTT 
CAGQAGCGGGCTTGCAGCGCGCTGGAACGCGGCCACGGTGTGCTGGTGTGCGCGCCGACCG 
GCGCTGGCAAGACGGTGGTCGGCGAGTTCGCCGTGCACGTGGCGCTGGCGGCCGGCAGTAAA 

25 TGTTTGTACACCACGCCGCTGAAAGCCCTGAGCAACCAAAAGCACACCGATCTCACAGCACGCT 
ACGGCCGTGACCAGATCGGGCTGCTGACCGGTGACCTGTCGGTCAACGGCAACGCGCCGGTG 
GTGGTGATGACCACCGAAGTGCTGCGCAACATGCTCTACGCGGATTCGCCTGCGCTGCAGGGG 
CTTTCCTATGTGGTGATGGATGAGGTGCATTTCCTCGCCGACCGGATGCGGGGTCCGGTGTGG 
GAGGAGGTGATCCTGCAACTGCCCGACGACGTGCGGGTGGTCAGCCTGTCGGCGACGGTGAG 

30 CAACGCCGAGGAGTTCGGCGGTTGGATCCAGACGGTGCGGGGCGACACCACGGTGGTGGTCG 
ACGAGCATCGGCCGGTGCCGTTGTGGCAACACGTCTTGGTGGGCAAGCGCATGTTCGACCTGT 
TCGATTACCGGATCGGCGAAGCCGAAGGGCAGCCCCAAGTCAAGCGCGAGTTGCTGCGCCACA 
TCGCGCATCGCCGTGAGGCCGACCGGATGGCCGATTGGCAGCCTCGGGGCCGAGGCTCGGGC 
CGGCCCGGCTTCTACCGGCCACCCGGCCGACCGGAGGTGATCGCCAAACTCGACGCTGAAGG 

35 GCTGTTGCCGGCGATCACCTTCGTGTTCTCCCGGGCCGGTTGTGACGCCGCGGTCACCCAATG 
CCTGCGGTCACCGCTGCGGTTGAGCAGCGAAGAGGAGCGCGCACGGATCGCCGAGGTGATCG 
ACCACCGCTGCGGTGACCTGGCCGACTCCGACCTGGCGGTACTCGGCTACTACGAATGGCGG 
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GAAGGGTTACTGCGCGGTCTGGCCGCCCACCACGCGGGCATGTTGCCGGCCTTCCGGCACAC 
GGTGGAGGAGCTGTTCACCGCCGGTTTGGTCAAGGCTGTATTCGCCACCGAGACTCTGGCGCT 
CGGTATCAACATGCCGGCCGGCACGGTGGTGCTGGAGCGGCTGGTGAAGTTCAACGGTGAGCA 
GCACATGCCGCTGACGCCGGGGGAGTACACCCAACTGACCGGTCGCGCCGGCCGGCGCGGTA 
5 TCGACGTCGAGGGTCACGCGGTGGTGATCTGGCACCCGGAAATTGAACCGTCCGAGGTGGCG 
GGCCTGGCCTCCACCCGCACCTTTCCGCTGCGCAGCTCGTTTGCCCCGTCGTAGAACATGACG 
ATCAACCTGGTGCACCGGATGGGTCCGCAACAGGCGCACCGACTGCTCGAGCAGTCGTTCGCC 
CAATATCAGGCCGACCGATCCGTGGTCGGACTGGTCCGCGGAATTGAGCGGGGCAACAGGATA 
CTCGGCGAGATCGCAGCCGAACTGGGCGGATCTGATGCGCCCATCCTCGAATACGCTCGATTG 

10 CGCGCGCGGGTGTCCGAGCTGGAACGTGCGCAGGCCCGCGCGTCGCGGTTACAGCGACGGC 
AGGCGGCCACCGATGCGCTGGCCGCGCTGCGCCGCGGTGACATCATCACCATCACCCACGGC 
CGCCGCGGTGGTCTGGCCGTCGTCCTGGAATCAGCCCGCGACCGCGACGACCCGCGTCCGCT 
GGTGCTAACCGAACACCGATGGGCGGGACGGATCTCCTCGGCCGACTACTCGGGCACGACGC 
CGGTGGGGTCGATGACGCTGCCCAAGCGGGTGGAGCACCGCCAGCCGCGGGTCCGGCGTGA 

15 CCTGGCCTCGGCGCTGCGATCGGCAGCCGCGGGTCTGGTTATTCCAGCCGCCCGGCGCGTCA 
GCGAGGCCGGCGGGTTTCACGATCCGGAGCTGGAGTCGTCGCGCGAACAATTGCGCCGTCAT 
CCGGTGCATACCTCGCCCGGGCTCGAGGACCAGATCCGCCAGGCCGAGCGTTACTTACGCATC 
GAACGCGACAACGCGCAATTAGAGAGGAAGGTCGCCGCCGCCACCAACTCGTTGGCCCGCAC 
GTTCGACCGATTCGTCGGGCTGCTCACCGAACGGGAGTTCATCGATGGCCCGGCCACTGATCC 

20 CGTGGTCACCGACGACGGCCGGCTGCTGGCGCGGATTTACAGCGAGAGCGACCTGTTGGTGG 
CCGAGTGCCTACGTACAGGTGCGTGGGAGGGTTTAAAGCCGGCCGAATTGGCGGGGGTGGTG 
TCGGCGGTGGTCTACGAGACGCGCGGTGGTGACGGCCAGGGCGCCCCGTTCGGAGCCGATGT 
GCCCACACGGCGGTTACGGCAGGCTCTGACTCAGACATCAAGGCTGTCCACGACATTGCGCGC 
CGACGAGCAGGCACACCGCATCACCCCGAGTCGCGAACCCGACGATGGCTTTGTCAGAGTCAT 

25 CTACCGCTGGTCGCGAACCGGTGATCTAGCGGCGGCATTGGCCGCTGCCGACGTGAACGGCA 
GCGGATCACCGTTATTGGCAGGGGAT1TCGTGCGTTGGTGCCGTCAGGTGCTCGATCTGCTGG 
ACCAAGTTCGTAACGCTGCGCCCAACCCCGAACTGCGGGCTACCGCAAAGCGCGCTATCGGTG 
ACATTCGGCGCGGCGTCGTCGCGGTTGACGCCGGGTAG 

30 >Rv2101 helZ helicase, Snf2/Rad54 family TB.seq 2360238:2363276 MW:1 11 632 
>emb|AL123456|MTBH37RV:2360238^363279 t helZ SEQ ID NO:68 

ATGCTGGTTTTGCACGGCTTCTGGTCCAACTCCGGCGGGATGCGGCTGTGGGCGGAGGACTCC 
GATCTGCTGGTGAAGAGCCCGAGTCAGGCGCTGCGCTCCGCGCGGCCACACCCGTTCGCGGC 
GCCCGCTGACCTGATCGCCGGCATACATCCGGGCAMCCCGCMCCGCCGTTTTGCTGTTGCC 
35 GTCGTTGCGATCGGCGCCGCTGGACTCGCCGGAGCTGATCCGGCTCGCCCCGCGCCCGGCCG 
CGCGAACCGATCCGATGCTGTTGGCGTGGACGGTACCGGTGGTGGACCTGGACCCCACCGCG 
GCGTTGGCCGCCTTCGACCAGCCCGCCCCCGACGTCCGCTACGGCGCGTCCGTCGACTACCT 
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ggccgagctggccgttttcgcgc6cgagttggtcgagcgtggtcgcgtgctgccccagctgc 
gccgcgacacccacggcgcggccgcctgctggcgtccggtgttgcagggacgcgacgtggtc 
gcgatgacctcgctggtctcggcgatgccgccggtctgccgcgccgaagttggtgggcacga 
cccgcacgaactggcaacctcggctctggacgcgatggtcgacgccgccgtgcgcgcggcgc 

5 tgtcaccgatggacctgctgcccccgcgacggggtcgctccaaacggcatcgggccgtggag 
gcttggctgaccgcgttgacctgcccggacggccggttcgacgcggagcccgacgaactcga 
cgcgctggccgaggcgttgcggccatgggacgacgtcggtatcggcaccgtcggcccggcgc 
gggcgacgtttcggctgtccgaagtcgagaccgaaaacgaggagacgcccgcgggctcgttg 
tggaggctggagttcttattgcagtcgacgcaggaccccagcctgctggt(x)ccgccgagcag 

10 ggatggaacgacgacggcagcctgcgccgctggctggaccgggcgcaggagctgctgctgac 
cgaactgggccgggcctctc6gattttccccgagctcgtcccggcgctgcgcaccgcgt6cc 
cgtccgggcttgagctcgacgccgacggcgcctaccgattcctgtcgggtacggccgcggtg 
ctcgacgaggctgggtttggcgtgctgctgccgtcctggtgggaccgccgccgcaagctggg 
cttggtcctgtgggcatataccccggtcgacggcgtggtgggcaagggcagcaagttcggccg 

15 cgagcagctcgtcgagttccgctgggagctggccgtgggcgacgatccgctcagcgaggagg 
agatcgcggcgctgaccgaaaccaagtccccgctgatccggctgcgtggccagtgggtcgcg 
ctcgataccgaacagatgcgccgcgggctggagtttttggagcgtaagccaaccggccgcaag 
accaccgccgagatcctcgcgctggccgccagccaccccgacgacgtggacaccccgctcga 
ggtcaccgccgtacgcgccgacggctggctcggggacctgctcgccggggccgccgcggcg 

20 tcgctgcagccgttggacccgcccgacggattcaccgcgacgctgcgtccctaccagcagcgc 
ggtctggcgtggctggcgtttttgtcctcgctcggtttgggcagctgcctggccgacgacatg 
ggcctgggcaagacggtgcagctattggccctggaaaccttggaatccgttcagcgccaccag 
gatcgcggcgtcggacccacactgctactgtgcccgatgtcgttggtgggcaactggccgcag 
gaagcggccaggtttgcacccaacctgcgggtgtacgcccaccacgggggcgcccggctgca 

25 cggcgaggcgttgcgcgaccacctcgagcgcaccgacctggtcgtgagcacctataccaccg 
ccacccgcgacatcgacgagctggcggaatacgaatggaaccgggtggtgctggacgaggcc 
caggcggtgaagaacagcctgtcccgggcggccaaggcggtgcgacggctacgcgcggcgc 
accgggtcgcgctgaccgggacaccgatggagaaccggctcgccgagctgtggtcgatcatg 
gacttcctcaacccgggcctgctcggatcctccgaacgcttccgcacccgctacgcgatcccg 

30 atcgagcggcacgggcacaccgaaccggccgaacggctgcgcgcatcgacgcggccctacat 
cctgcgccggctcaagaccgacccggcgatcatcgacgatctgccggagaagatcgagatcaa 
gcagtactgccaactcaccaccgagcaggcgtcgctgtatcaggccgtcgtcgccgacatgat 
ggaaaagatcgaaaagaccgaagggatcgagcggcgcggcaacgtgctggccgcgatggcca 
agctcaaacaggtgtgcaaccaccccgcccagctgctgcacgatcgctccccggtcggtcggc 

35 ggtccgggaaggtgatccggctcgaggagatcctggaagagatcctggccgagggcgaccgg 
gtgctgtgttttacccagttcaccgagttcgccgagctgctggtgccgcacctggccgcacgc 
ttcggccgtgccgcccgagacattgcctacctgcacggtggcaccccgaggaagcggcgtga 
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CGAGATGGTGGCCCGGTTCCAGTCCGGTGACGGCCCGCCCATTTTTCTGCTGTCGTTGAAGGC 

GGGCGGTACCGGGCTGAACCTCACCGCCGCCAATCATGTTGTGCACCTGGACCGCTGGTGGAA 

CCGGGCGGTCGAGAACCAGGCGACGGACCGGGCGTTTCGGATCGGGCAGCGGCGCAGGGTG 

CAGGTCCGGAAGTTCATCTGCACCGGCACCCTCGAGGAGAAGATCGACGAAATGATCGAGGAG 

AAAAAGGCGCTGGCCGACTTGGTGGTCACCGACGGCGAAGGCTGGCTGACCGAACTGTCCACC 

CGCGATCTGCGCGAGGTGTTCGCGCTGTCCGAAGGCGCCGTCGGTGAGTAG 

>Rv21 10c prcB proteasome [beta]-type subunit 2 TB.seq 2369727:2370599 MW:30274 
>emb|AL123456|MTBH37RV:c2370599-2369724, prcB SEQ ID NO:69 

GTGACCTGGCCGTTGCCCGATCGCCTGTCCATTAATTCACTCTCTGGAACACCCGCTGTAGACC 

TATCTTCTTTCACTGACTTCCTGCGCCGCCAGGCGCCGGAGTTGCTGCCGGCAAGCATCAGCG 

GCGGTGCGCCACTCGCAGGCGGCGATGCGCAACTGCCGCACGGCACCACCATTGTCGCGCTG 

AAATACCCCGGCGGTGTTGTCATGGCGGGTGACCGGCGTTCGACGCAGGGCAACATGATTTCT 

GGGCGTGATGTGCGCAAGGTGTATATCACCGATGACTACACCGCTACCGGCATCGCTGGCACG 

GCTGCGGTCGCGGTTGAGTTTGCCCGGCTGTATGCCGTGGAACTTGAGCACTACGAGAAGCTC 

GAGGGTGTGCCGCTGACGTTTGCCGGCAAAATCAACCGGCTGGCGATTATGGTGCGTGGCAAT 

CTGGCGGCCGCGATGCAGGGTCTGCTGGCGTTGCCGTTGCTGGCGGGCTACGACATTCATGCG 

TCTGACCCGCAGAGCGCGGGTCGTATCGTTTCGTTCGACGCCGCCGGCGGTTGGAACATCGAG 

GAAGAGGGCTATCAGGCGGTGGGCTCGGGTTCGCTGTTCGCGAAGTCGTCGATGAAGAAGTTG 

TATTCGCAGGTTACCGACGGTGATTCGGGGCTGCGGGTGGCGGTCGAGGCGCTCTACGACGCC 

GCCGACGACGACTCCGCCACCGGCGGTCCGGACCTGGTGCGGGGCATCTTTCCGACGGCGGT 

GATCATCGACGCCGACGGGGCGGTTGACGTGCCGGAGAGCCGGATTGCCGAATTGGCCCGCG 

CGATCATCGAAAGCCGTTCGGGTGCGGATACTTTCGGCTCCGATGGCGGTGAGAAGTGA 

>Rv21 1 8c - = B21 26_C1_1 65 (83.6%) TB.seq 2377471 :237831 0 MW:30091 
>embjAL123456|MTBH37RV:c2378310"2377468, Rv2118c SEQ ID NO:70 

GTGTCAGCAACCGGCCCATTCAGCATCGGCGAACGTGTTCAGCTCACCGACGCTAAGGGGCGC 

CGCTACACCATGTCGCTGACTCCCGGTGCCGAATTCCACACTCATCGTGGCTCGATCGCCCACG 

ACGCGGTGATCGGGTTGGAGCAAGGGAGCGTGGTCAAATCCAGCAACGGCGCCCTG7TCCTGG 

TGCTGCGCCCGCTGCTGGTCGACTACGTCATGTCGATGCCGCGCGGCCCGCAGGTGATCTATC 

CCAAAGATGCGGCCCAGATCGTGCATGAGGGCGACATATTTCCCGGCGCGCGGGTGCTGGAG 

GCAGGAGCCGGATCCGGTGCTCTGACCTTGTCTTTGCTGCGGGCGGTTGGGCCGGCCGGACA 

GGTGATCTCCTACGAACAGCGCGCCGATCATGCCGAACACGCCCGGCGCAATGTGAGCGGCTG 

CTACGGCCAGCCGCCGGACAACTGGCGACTGGTCGTCAGCGACCTCGCCGACTCCGAACTGC 

CCGACGGATCCGTTGATCGGGCCGTGCTCGACATGCTGGCGCCGTGGGAGGTGCTCGACGCG 

GTATCGCGGCTGCTGGTCGCCGGCGGAGTGCTGATGGTCTACGTGGCCACCGTCACTCAGCTG 

TCGAGGATCGTGGAGGCACTGCGGGCCAAGCAGTGCTGGACCGAACCGAGAGCCTGGGAGAC 
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GCTGCAGCGGGGCTGGAACGTCGTAGGGTTGGCGGTTCGGCCGCAGCATTCGATGCGCGG6C 
ATACCGCGTTCCTGGTAGCAACGCGCCGGTTGGCGCCGGGGGCTGTGGCTCCGGCGCCGCTA 
GGTCGTAAGCGCGAGGGACGCGACGGGTAG 

>Rv2144c - TB.seq 2404166:2404519 MW.12028 

>emb|AL123456|MTBH37RV:c2404519-2404163, Rv2144c SEQ ID NO:71 

ATGCTGATCATTGCGCTGGTCTTGGCCCTGATTGGGCTCCTGGCCTTGGTGTTCGCGGTGGTCA 

CCAGCAACCAGCTAGTGGCCTGGGTATGCATCGGGGCCAGCGTGCTGGGTGTGGCGTTGCTGA 

TCGTCGATGCGTTGCGAGAACGCCAGCAAGGTGGCGCGGACGAAGCTGATGGGGCTGGGGAA 

ACGGGTGTCGCGGAGGAAGCCGACGTCGACTACCCGGAGGAAGCCCCCGAGGAGAGCCAAGC 

CGTCGACGCCGGTGTCATCGGCAGTGAGGAGCCATCGGAGGAGGCCAGCGAAGCGACCGAGG 

AGTCGGCGGTATCGGCGGACCGAAGCGACGACAGCGCCAAGTAG 

>Rv2146c - TB.seq 2405667:2405954 MW:10805 

>emb|AL123456|MTBH37RV:c2405954-2405664, Rv2146c SEQ ID NO:72 

TTGGTGGTGI 1 1 M 1 CAGATCCTTGGGTTCGCGCTGTTCATCTTCTGGCTGCTGCTGATCGCTGG 

GGTCGTCGTTGAGTTCATCCGCTCGTTCAGCCGTGACTGGCGTCCCACCGGTGTCACCGTGGT 

GATGTTGGAGATCATCATGTCGATCACTGATCCGCCGGTGAAGGTGCTGCGCCGGCTGATCCC 

GCAACTCACGATCGGCGCGGTCCGGTTCGACCTGTCGATCATGGTGCTGCTGCTGGTTGCGTT 

CATCGGTATGCAACTGGCGTTTGGTGCTGCGGCCTGA 

>Rv2147c- TB.seq 2406119:2406841 MW:27630 

>emb|AL123456|MTBH37RV:c2406841-2406116, Rv2147c SEQ ID NO:73 

GTGAATAGTCACTGTAGTCACACCTTCATCACAGACAACAGATCTCCCAGGGCTAGAAGGGGTC 

ACGCAATGAGCACACTGCACAAGGTCAAGGCCTACTTCGGTATGGCTCCCATGGAGGATTACGA 

CGACGAGTACTACGACGACCGCGCTCCCTCGCGCGGGTATGCGCGGCCCCGATTCGACGACG 

ACTACGGCCGCTACGATGGGCGCGACTACGACGACGCGCGCAGCGATTCACGCGGTGACCTG 

CGCGGTGAGCCGGCCGACTATCCACCACCGGGATATCGCGGCGGGTACGCGGACGAACCACG 

TTTCCGGCCCCGGGAGTTCGACCGCGCGGAGATGACACGGCCGCGCTTCGGATCGTGGCTGC 

GCAACTCCACCCGCGGCGCGCTAGCGATGGACCCCCGCCGGATGGCGATGATGTTCGAGGAT 

GGCCATCCGCTCTCGAAGATCACCACGCTGCGGCCCAAGGACTACAGCGAGGCTCGCACCATC 

GGTGAGCGGTTCCGCGACGGCAGCCCGGTCATCATGGATCTGGTGTCGATGGACAACGCCGAT 

GCCMGCGGCTGGTCGATTTCGCGGCCGGCCTGGCCTTCGCGCTGCGCGGCTCGTTCGACAA 

GGTCGCGACCAAGGTGTTCCTGCTCTCGCCTGCAGACGTCGATGTGTCCCCCGAGGAGCGCCG 

CAGGATCGCCGAAACCGGGTTCTACGCCTACCAATAG 

>Rv2148c - TB.seq 2406841:2407614 MW:27694 
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>emb|AL123456|MTBH37RV:c24076l4-2406838, Rv2148c SEQ ID NO:74 

ATGGCGGCGGATCTTTCGGCGTATCCAGACCGCGAATCGGAATTGACGCATGCGTTGGCGGCA 
ATGCGATCGCGACTTGCGGCGGCCGCGGAGGCGGCGGGTCGCAATGTCGGCGAAATTGAACT 
TCTACCGATTACCAMTTCTTTC^ 

CGTTGGCGMTCGGGCGMCAGGMGCTTCAGCCAAGATGGCCGAAC7TAATCGGTTGTTGGC 

GGCTGCCGAGTTGGGTCACTCGGGGGGTGTGCACTGGCACATGGTGGGCCGGATTCAACGCA 

ACAAAGCCGGGTCGCTGGCTCGCTGGGCGCACACCGCTCACTCGGTGGACAGCTCGCGGTTG 

GTGACCGCGCTGGATCGGGCGGTTGTTGCGGCGCTGGCCGAACACCGTCGTGGCGAGCGGCT 

GCGGGTTTACGTCCAGGTCAGCCTCGACGGTGACGGATCCCGGGGCGGCGTCGACAGCACGA 

CGCCCGGCGCCGTAGACCGGATTTGCGCGCAGGTGCAGGAGTCAGAGGGCCTCGAACTGGTC 

GGGTTGATGGGCATTCCGCCGCTGGATTGGGACCCGGACGAGGCCTTTGACCGGCTGCAATCG 

GAGCACAACCGGGTGCGTGCGATGTTCCCGCACGCGATCGGTCTGTCGGCGGGCATGTCCAAC 

GACCTTGAAGTCGCCGTCAAACAT6GTTCGACCTGTGTGCGTGTCGGTACCGCGCTATTGGGTC 

CGCGGCGGTTACGGTCACCGTGA 

>Rv2150c ftsZ TB.seq 2408386:2409522 MW:38757 
>emb|AL123456|MTBH37RV:c2409522-2408383, ftsZ SEQ ID NO:75 

ATGACCCCCCCGCACAACTACCTGGCCGTCATCAAGGTCGTGGGTATCGGTGGTGGCGGTGTC 

AACGCCGTCAACCGAATGATCGAGCAGGGCCTCAAAGGCGTGGAATTCATCGCGATCAACACC 

GACGCCCAGGCGTTGTTGATGAGCGATGCCGACGTCAAACTCGACGTCGGCCGCGACTCCACC 

CGCGGGCTGGGCGCCGGCGCCGATCCGGAGGTCGGCCGTAAGGCCGCCGAGGACGCCAAGG 

ACGAGATCGAAGAGCTGCTGCGCGGTGCCGACATGGTGTTTGTCACCGCCGGCGAGGGGGGC 

GGAACCGGCACCGGGGGGGCACCCGTCGTCGCCAGCATCGCCCGCAAGCTGGGCGCGTTGAC 

CGTGGGTGTGGTCACCCGGCCGTTCTCGTTCGAGGGCAAGCGACGCAGCAATCAGGCCGAAAA 

TGGCATCGCGGCGCTGCGGGAGAGTTGCGACACCCTCATCGTGATTCCCAACGACCGGTTGCT 

GCAGATGGGAGATGCCGCGGTATCGCTGATGGATGCTTTCCGTAGCGCCGACGAGGTGCTGCT 

CAACGGCGTGCAGGGCATCACCGACCTGATTACCACCCCGGGTCTAATCAAGGTCGACTTCGC 

CGACGTCAAGGGCATCATGTCCGGTGCCGGCACCGCACTGATGGGCATCGGCTCGGCCCGGG 

GCGAAGGCCGGTCGCTCAAAGCGGCCGAGATCGCCATCAACTCGCCGTTGCTGGAAGCCTCGA 

TGGAGGGCGCGCAAGGCGTGCTGATGTCGATCGCCGGCGGCAGCGACTTGGGCTTGTTCGAG 

ATCAACGAGGCGGCCTCGTTGGTACAAGACGCCGCTCACCCCGATGCCAACATCATCTTCGGC 

ACCGTCATCGACGATTCGCTCGGTGACGAGGTGCGGGTGACCGTGATCGCGGCCGGCTTCGAC 

GTCAGCGGTCCCGGCCGCAAGCCGGTGATGGGCGAGACCGGCGGCGCCCACCGGATCGAGT 

CAGCCAAGGCAGGCAAGCTCACCTCGACCTTGTTCGAGCCGGTCGACGCCGTCAGCGTGCCGT 

TGCACACCAACGGCGCAACCCTGAGCATCGGCGGTGATGACGACGATGTCGACGTGCCGCCCT 

TCATGCGCCGCTGA 
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>Rv2152c murC TB.seq 2410639:2412120 MW:51 146 
>emb|AL123456|MTBH37RV:c2412120-2410636; murC SEQ ID NO:76 

GTGAGCACCGAGCAGTTGCCGCCCQATCTGCGGCGGGT6CACATGGTCGGCATCGGCGGAGC 
TGGCATGTCGGGCATCGCCCGAATCCTGCTGGACCGCGGCGGGCTGGTCTCCGGGTCAGACG 

5 CCAAGGAGTCGCGCGGTGTGCATGCGCTGCGGGCGCGGGGCGCGTTGATCCGGATCGGACAC 
GAGGCGTCGTCGCTGGACCTGTTGCCCGGTGGCGCCACGGCGGTCGTCACTACCCATGCCGC 
CATCCCCAAAACCAACCCCGAGCTCGTCGAAGCGAGGCGCCGCGGCATTCCCGTGGTGCTGCG 
GCCGGCCGTGCTGGCCAAGTTGATGGCCGGGCGCACCACATTGATGGTCACCGGCACGCACG 
GCAAGACAACGACGACGTCCATGCTGATCGTCGCCCTGCAGCACTGCGGGCTTGACCCGTCCT 

10 TTGCGGTCGGCGGTGAGCTGGGGGAGGCCGGTACCAACGCCCATCACGGCAGTGGCGACTGT 
TTCGTCGCCGAAGCCGACGAAAGCGATGGCTCGCTGTTGCAGTACACACCCCACGTCGCGGTG 
ATCACCAACATCGAGTCCGATCACCTGGACTTCTACGGCAGCGTCGAGGCGTATGTTGCGGTGT 
TCGACTCCTTCGTGGAGCGCATTGTCCCCGGGGGTGCGCTGGTGGTGTGCACTGACGACCCCG 
GAGGGGCCGCGCTGGCTCAGCGCGCGACTGAGCTGGGAATTCGAGTGCTGCGATACGGGTCG 

15 GTGCCGGGTGAGACCATGGCAGCCACGTTGGTCTCGTGGCAGCAACAGGGGGTCGGCGCGGT 
CGCACATATCCGGTTGGCCTCAGAACTAGCCACAGCACAGGGTCCCCGCGTGATGCGGCTGTC 
GGTGCCCGGGCGACACATGGCGCTCAACGCGCTGGGAGCGCTGCTGGCCGCGGTGCAGATCG 
GCGCCCCGGCCGACGAGGTGCTCGACGGGCTGGCCGGCTTCGAAGGAGTGCGGCGACGATTC 
GAACTGG7TGGGACCTGCGGCGTCGGAAAGGCGTCGGTGCGCGTGTTCGATGACTACGCCCAC 

20 CACCCGACGGAGATCAGCGCGACACTGGCGGCGGCGCGCATGGTGCTCGAACAGGGCGACGG 
TGGCCGCTGCATGGTTGTGTTTCAACCCCATTTGTATTCGCGGACAAAGGCATTCGCTGCTGAG 
TTTGGGCGTGCGCTGAATGCCGCTGACGAGGTGTTCGTACTCGACGTCTACGGAGCTCGTGAA 
CAACCGCTGGCCGGTGTCAGCGGAGCCAGCGTCGCTGAGCACGTCACTGTGCCGATGCGCTA 
CGTCCCGGATTTTTCGGCGGTCGCACAGCAAGTGGCCGCCGCCGCTAGTCCGGGCGACGTCAT 

25 CGTCACGATGGGTGCCGGAGACGTGACCTTGCTGGGCCCGGAAATCCTGACCGCCCTTCGGGT 
CCGGGCCAACCGAAGCGCCCCCGGCCGTCCGGGGGTGCTGGGATGA 

>Rv2153c murG TB.seq 2412120:2413349 MW:41829 
>emb|ALl23456|MTBH37RV:c2413349-2412117,murG SEQIDNO:77 

30 GTGAAGGACACGGTCAGCCAGCCGGCCGGCGGGCGCGGGGCAACGGCGCCCCGGCCCGCCG 
ATGCCGCCTCGCCGTCTTGTGGTTCCTCGCCGTCTGCTGATTCCGTGTCGGTCGTTCTCGCCGG 
CGGCGGGACCGCCGGGCACGTCGAGCCCGCCATGGCCGTCGCCGACGCCTTGGTCGCGTTGG 
ATCCGCGCGTCCGGATTACCGCGTTGGGCACCCTCCGTGGACTAGAGACCAGGCTGGTGCCCC 
AGCGCGGCTACCACCTGGAGCTGATCACGGC6GTGCCGATGCCGCGCAAGCCCGGCGGCGAC 

35 CTGGCCCGGCTGCCGTCGCGGGTGTGGCGCGCCGTCCGGGAGGCCCGGGACGTGCTCGACG 
ATGTCGACGCCGACGTCGTCGTCGGTTTCGGTGGGTACGTCGCGCTACCGGCTTACCTAGCCG 
CTCGCGGCCTGCCTTT6CCGCCCCGGCGCCGGCGCCGGATCGCGGTGGTGATCCACGAAGCC 
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AACGCCAGG6CGGGACTGGCCAACCGGGTCGGCGCCCATACCGCGGACCGGGTGCTCTCCGC 

GGTGCCGGATTCCGGGCTGCGGCGCGCCGAGGTGGTTGGGGTCCCGGTCCGTGCGTCGATCG 

CCGCGCTQGACCGCGCGGTGCTGCGAGCCGAGGCGCGGGCACACTTCGGCTTCCCCGACGAC 

GCGCGGGTGCTGCTGGTGTTCGGGGGTTCGCAGGGCGCGGTCTCGCTCAACCGGGCGGTGTC 

CGGCGCCGCCGCCGACCTGGCCGCCGCCGGTGTTTGCGTGCTGCATGCCCATGGACCCCAGA 

ACGTGCTGGAGTTGCGCCGTCGGGCTCAAGGTGACCCACCGTACGTGGCGGTGCCCTATTTGG 

ACCGGATGGAGCTGGCCTACGCCGCCGCCGATCTGGTGATCTGCCGGGCGGGGGCGATGACG 

GTCGCCGAAGTATCCGCCGTCGGTCTGCCGGCCATCTACGTGCCGCTGCCGATCGGCAACGGT 

GAACAGCGGCTGAATGCGTTGCCGGTAGTCAATGCCGGCGGCGGCATGGTGGTCGCCGACGC 

CGCCCTGACCCCCGAGTTGGTGGCCCGCCAGGTTGCCGGGCTGCTCACCGACCCCGCGCGGC 

TGGCCGCGATGACCGCGGCCGCAGCCAGGGTGGGACATCGCGATGCCGCGGGCCAGGTGGC 

CCGGGCCGCGCTGGCCGTCGCCACCGGGGCCGGTGCCAGGACAACGACGTGA 

>Rv2164c ftsW TB.seq 2413349:2414920 MW:56306 
>emb|AL123456|MTBH37RV:c2414920-2413346 l ftsW SEQ ID NO:78 

GTGCTAACCCGGTTGCTGCGTCGGGGCACCAGCGACACCGACGGCTCCCAGACTCGAGGGGC 

CGAGCCGGTCGAGGGGCAGCGGACGGGCCCGGAAGAAGCCTCTAACCCGGGTTCGGCGAGG 

CCCCGCACCCGTTTCGGTGCCTGGCTGGGCCGTCCGATGACCTCGTTTCACCTCATCATCGCC 

GTTGCCGCATTGCTGACCACCCTTGGACTGATCATGGTGCTGTCGGCATCGGCGGTGCGGTCC 

TAGGACGACGACGGATCGGCTTGGGTGATCTTCGGCAAGCAGGTCTTGTGGACGCTTGTGGGT 

CTTATCGGCGGCTATGTCTGTCTGCGGATGTCGGTGCGGTTCATGCGGCGCATCGCCTTCTCCG 

GTTTCGCGATCACCATCGTGATGCTGGTGCTGGTGCTGGTGCCGGGGATCGGCAAGGAGGCCA 

ACGGCTCGCGCGGCTGGTTCGTGGTCGCGGGCTTCTCGATGCAGCCCTCTGAGCTGGCTAAGA 

TGGCGTTCGCCATCTGGGGAGCGCATCTGCTGGCCGCCCGGCGCATGGAACGGGCTTCACTG 

CGCGAGATGCTGATTCCACTGGTGCCGGCCGCCGTCGTTGCGCTGGCGCTGATCGTGGCCCAG 

CCCGACCTCGGACAGACCGTGTCGATGGGCATCATCTTGTTGGGCCTGCTGTGGTATGCGGGG 

CTGCCGCTGCGGGTCTTCCTCAGCTCACTGGCGGCGGTCGTCGTCTCGGCCGCCATCCTGGCG 

GTGTCCGCGGGCTACCGATCCGACCGGGTGCGGTCGTGGCTCAACCCCGAAAACGATCCGCAA 

GACTCCGGCTACCAGGCCCGACAGGCAAAGTTCGCGCTGGCTCMGGTGGCATTTTCGGCGAC 

GGTCTGGGCCAAGGCGTGGCCAAGTGGAACTACTTGCCCAACGCCCACMCGACTTCATTTTCG 

CCATCATCGGCGAAGAGCTGGGTCTCGTCGGCGCGCTCGGACTGCTGGGGCTATTCGGATTGT 

TCGCCTACACCGGCATGCGCATCGCTAGCCGGTCCGCCGACCCGTTCCTGCGGCTGCTGACCG 

CCACCACGACACTGTGGGTGCTGGGACAGGCGTTCATCAACATCGGCTATGTGATCGGGCTGC 

TGCCCGTCACCGGCCTGCAGCTGCCGCTCATCTCGGCCGGTGGAACCTCCACGGCCGCAACAC 

TTTCGCTGATAGGCATCATCGCCAACGCGGCTCGCCACGAACCGGAGGCGGTGGCCGCGCTG 

CGGGCTGGGCGCGACGACAAGGTGAACCGGTTGCTGCGGCTGCCGCTGCCCGAGCCGTATCT 

GCCCCCTCGTCTCGAGGCGTTTCGTGACCGCAAGCGCGCCAACCCGCAACCGGCCCAAACGCA 
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GCCCGCGCGGAAGACCCCCCGCACGGCGCCCGGACAGCCTGCCC6GCAGATGGGCCTGCCC 
CCGCGACCCGGCTCGCCCCGCACGGCCGATCCGCCGGTTCGTCGATCAGTGCATCATGGAGCT 
GGCCAGCGGTACGCGGGCCAGCGTCGCACACGGCGCGTTCGGGCATTGGAAGGTCAGCGTTA 
CGGGTGA 

>Rv2155c murD TB.seq 2414935:2416392 MW:49314 
>emb|AL123456|MTBH37RV:c2416392-2414932, murD SEQ ID NO:79 

GTGCTTGACCCTCTGGGGCCGGGTGCGCCCGTGTTGGTAGCCGGTGGCCGGGTGACCGGTCA 

GGCGGTGGCCGCGGTGCTGACTCGGTTTGGTGCGACGCCGACGGTGTGCGACGACGATCCGG 

TCATGCTGCGACCGCACGCCGAACGTGGGCTGCCGACCGTTAGTTCCTCGGACGCGGTGCAGC 

AGATAACCGGGTATGCGCTGGTGGTCGCCAGTCCCGGCTTCTCGCCCGCAACCCCGCTACTGG 

CCGCGGCCGCGGCGGCGGGGGTGCCGATCTGGGGTGACGTGGAGTTAGCCTGGCGGCTAGA 

CGCAGCGGGCTGCTACGGACCGCCGCGCAGCTGGCTGGTGGTGACCGGCACCAACGGCAAGA 

CCACCACGACGTCGATGCTGCACGCCATGCTGATCGCCGGTGGCCGCCGCGCCGTGCTGTGC 

GGCAATATCGGCAGTGCGGTGCTGGATGTGCTGGACGAGCCGGCCGAGCTGCTGGCCGTGGA 

GTTGTCCAGTTTCCAGCTGCACTGGGCGCCGTCGCTGCGGCCCGAGGCCGGCGCGGTGCTCA 

ACATTGCCGAAGACGACCTGGACTGGCATGCCACGATGGCCGAATACACCGCGGCCAAGGCCC 

GGGTGCTGACCGGCGGGGTAGCGGTGGCCGGGCTGGATGACAGCCGAGCGGCCGCACTGCT 

GGACGGCTCACCGGCGCAGGTGCGGGTCGGCTTCCGGCTCGGCGAGCCGGCCGCGCGGGAA 

CTGGGCGTGCGCGACGCCCACCTGGTCGATCGCGCCTTCTCCGACGACTTGACGCTGCTGCCG 

GTCGCGTCGATACCGGTGCCAGGTCCGGTCGGCGTGCTTGACGCCCTGGCCGCGGCGGCGCT 

GGCCCGCTCGGTCGGGGTGCCCGCCGGTGCGATCGCCGAGGCGGTCACGTCGTTTCGAGTGG 

GCCGACACCGCGCCGAGGTGGTGGCCGTTGCCGACGGCATCACCTACGTGGACGACTCCAAG 

GCCACCAACCCGCACGCCGCGCGGGCTTCGGTGCTTGCATACCCGAGGGTGGTATGGATCGC 

CGGTGGCCTGCTCAAGGGCGCGTCGCTTCACGCCGAGGTTGCGGCGATGGCGTCGCGGCTGG 

TCGGTGCGGTGCTGATCGGCCGGGATCGCGCAGCGGTTGCCGAGGCGTTATCACGACACGCG 

CCCGATGTCCCAGTCGTTCAGGTTGTGGCAGGCGAGGATACTGGTATGCCTGCGACTGTTGAG 

GTTCCTGTTGCTTGTGTTCTAGATGTGGCAAAAGATGACAAAGCCGGTGAGACCGTTGGCGCTG 

CCGTGATGACCGCTGCGGTGGCCGCGGCCCGGCGGATGGCCCAACCCGGTGACACCGTGCTG 

CTGGCACCGGCCGGCGCCTCATTCGACCAGTTCACCGGTTATGCCGACCGGGGCGAGGCATTC 

GCGACCGCGGTCCGCGCGGTGATCCGGTAG 

>Rv2156c murX TB.seq 2416397:2417473 MW:37714 
>emb|AL123456|MTBH37RV:c2417473-2416394 t murX SEQ ID NO:80 
ATGAGGCAGATCCTTATCGCCGTTGCCGTAGCGGTGACGGTGTCCATCTTGCTGACCCCGGTG 
CTGATCCGGTTGTTCACTAAGCAGGGCTTCGGCCACCAGATCCGTGAGGATGGCCCGCCCAGC 
CACCACACCAAGCGCGGTACGGCGTCGATGGGCGGGGTGGCGATTCTGGCCGGCATCTGGGC 
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GGGCTACCTGGGCGCCCACCTAGCQGGCCTGGCQTTTGACGGTGAAGGCATCGGCGCATCGG 
GTCTGTTGGTGCTGGGCCTAGCCACCGCTTTGGGCGGCGTCGGGTTCATCGACGATCTGATCA 
AGATCCGCAGGTCGCGCAATCTCGGGTTGAACAAGACGGCCAAGACCGTCGGGCAGATCACCT 
CCGCCGTGCTGTTTGGCGTGCTGGTGCTGCAGTTCCGGAATGCTGCCGGCCTGACACCGGGCA 

5 GCGCGGATCTGTCCTACGTGCGTGAGATCGCCACCGTCACATTGGCGCCGGTGCTGTTCGTGT 
TGTTCTGCGTGGTCATCGTCAGCGCCTGGTCGAACGCGGTCAACTTCACCGATGGCCTGGACG 
GGCTGGCCGCCGGCACCATGGCGATGGTCACCGCCGCCTACGTGCTGATCACCTTCTGGCAGT 
ACCGCAACGCGTGCGTGACGGCGCCGGGCCTGGGCTGCTACAACGTGCGCGACCCGCTGGAC 
CTGGCGCTCATCGCGGCCGCMCCGCTGGCGCCTGCATCGGTTTTTTGTGGTGGAAGGGCGCG 

10 CCCGCCAAGATCTTCATGGGTGACACTGGGTCGCTGGCGTTGGGCGGCGTCATCGCGGGGTTG 
TCGGTGACCAGCCGCACCGAGATCCTTGCGGTGGTGCTGGGTGCGCTGTTCGTCGCCGAGATC 
ACCTCGGTGGTGTTGCAAATCCTGACCTTCCGGACCACCGGGCGCCGGATGTTTCGGATGGCG 
CCCTTCCACCACCATTTCGAGTTGGTCGGTTGGGCTGAAACCACGGTCATCATCCGGTTCTGGC 
TGCTCACCGCGATCACCTGCGGTCTGGGCGTGGCCTTGTTCTACGGTGAGTGGCTTGCCGCGG 

15 TCGGTGCCTGA 

>Rv2157c murF TB.seq 2417473:2419002 MW:51634 
>emb|AL123456IMTBH37RV:c2419002-2417470 l murF SEQ ID N0:81 

ATGATCGAGCTGACCGTCGCGCAGATCGCCGAGATCGTCGGGGGCGCAGTGGCCGATATCTCC 

20 CCGCAAGACGCCGCGCACCGCCGCGTCACCGGGACCGTCGAGTTCGACTCGCGCGCCATCGG 
CCCGGGCGGGCTGTTCCTCGCCCTGCCGGGGGCGCGCGCCGACGGGCACGACCATGCCGCG 
TCGGCGGTAGCCGCGGGCGCCGCCGTCGTGCTGGCCGCCCGCCCGGTGGGGGTGCCGGCCA 
TCGTGGTTCCGCCAGTGGCCGCGCCGAACGTATTGGCCGGCGTCCTCGAGCACGACAACGAC 
GGGTCGGGGGCGGCGGTGCTGGCCGCGCTGGCCAAGCTGGCCACCGCGGTGGCCGCGCAGT 

25 TGGTGGCCGGCGGGCTCACCATCATCGGGATCACCGGCTCGTCGGGCAAGACGTCGACCAAG 
GACCTGATGGCCGCCGTGCTGGCCCCGCTGGGGGAGGTGGTGGCCCCGCCCGGATCGTTCAA 
CAACGAGCTGGGTCAGCCGTGGACGGTGCTGCGCGCGACGCGGCGCACCGACTACCTGATTTT 
GGAGATGGCGGCACGCCATCACGGCAACATCGCCGCGCTCGCCGAGATCGCGCCCCCGTCGA 
TCGGAGTCGTGCTCAACGTCGGCACCGCACATTTGGGTGAGTTCGGCTCCCGCGAGGTCATCG 

30 GACAGACCAAAGCCGAACTGCCGCAGGCTGTTCCGCATTCCGGAGCGGTCGTCCTCAACGCTG 
ATGACCCCGCGGTGGCGGCGATGGCCAAGCTGACCGCGGCCCGGGTGGTGCGGGTCAGCCG 
GGACAACACCGGTGACGTTTGGGCGGGGCCGGTGTCGCTGGACGAATTGGCCAGGCCGCGCT 
TTACGCTGCATGCCCACGATGCCCAAGCCGAGGTCCGACTCGGGGTCTGCGGCGACCACCAG 
GTCACTAACGCGCTGTGCGCCGCGGCGGTCGCGCTGGAGTGTGGGGCCAGCGTTGAACAGGT 

35 CGCGGCCGCGCTGACCGCGGCGCCGCCGGTGTCGCGGCATCGGATGCAGGTGACCACCCGC 
GGCGACGGGGTGACGGTGATCGACGACGCCTACAACGCCAACCCCGACTCCATGCGGGCCGG 
GCTGCAGGCGCTGGCCTGGATCGCGCACCAACCCGAGGCCAGCCGCCGCAGCTGGGCGGTGC 
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TGGGTGAGATGGCCGAGCTGGGTGAGGACGCGATAGCCGAGCACGATCGCATCGGCCGGCTC 
GCGGTGCGCTTAGATGTGTCTCGACTCGTTGTCGTGGGAACCGGGAGGTCGATCAGCGCCATG 
CACCACGGAGCGGTCCTGGAGGGGGCGTGGGGCTCGGGGGAAGCCACTGCTGATCACGGTGC 
GGATCGCACGGCCGTCAATGTGGCCGACGGTGACGCCGCCCTGGCACTACTGCGCGCCGAGC 
5 TGCGACCCGGGGATGTGGTCTTGGTCAAGGCGTCGAACGCGGCCGGGCTGGGTGCGGTGGCC 
GATGCATTGGTCGCAGACGACACATGCGGGAGTGTGCGCCCATGA 

>Rv2158c murE TB.seq 2419002:2420606 MW:55310 
>emb|AL123456|MTBH37RV:c2420606-2418999, murE SEQ ID NO:82 

10 GTGTCATCGCTGGCCCGAGGGATCTCGCGGCGGCGAACGGAGGTGGCGACACAGGTGGAGGC 
TGCGCCCACTGGCTTGCGCCCCAACGCCGTCGTGGGCGTTCGGTTGGCCGCACTGGCCGATCA 
GGTCGGCGCGGCCCTGGCCGAGGGTCCAGCTCAGCGTGCCGTCACCGAGGACCGGACGGTCA 
CCGGGGTCACGCTGCGCGCCCAGGACGTGTCACCCGGTGACGTGTTCGCCGCCCTGACCGGC 
TCGACCACCCACGGGGCCCGCCACGTCGGCGACGCGATCGCACGCGGCGCCGTCGCGGTGCT 

15 CACCGACCCCGCCGGGGTCGCCGAGATCGCCGGACGAGCGGCCGTGCCCGTGTTGGTGCACC 
CCGCACCCCGCGGCGTGCTCGGCGGCTTGGCCGGCACCGTGTACGGGCATCCGTCGGAGCGG 
TTGACGGTTATCGGGATCACCGGAACGTCCGGCAAGACCACCACCACCTATCTGGTCGAGGCC 
GGGTTACGGGCTGCCGGACGCGTCGCCGGGCTGATCGGCACCATCGGCATCCGCGTCGGGGG 
CGCCGACCTTCCCAGCGCGCTGACCACCCCGGAGGCCCCCACGCTGCAGGCGATGCTGGCGG 

20 CGATGGTCGAACGCGGGGTGGACACCGTGGTCATGGAGGTGTCCAGGCACGCGCTGGCGCTG 
GGCCGGGTGGACGGGACCCGGTTCGCCGTCGGCGCCTTCACCAATCTCTCCCGTGACCACCTG 
GATTTCCACCCCAGCATGGCCGACTACTTCGAGGCCAAGGCGTCATTGTTCGATCCGGACTCGG 
CACTGCGCGCCCGCACCGCCGTGGTGTGCATCGACGACGACGCCGGGCGCGCGATGGCGGC 
GCGGGCCGCCGACGCGATCACCGTCAGCGCCGCCGACCGGCCCGCACACTGGCGCGCCACG 

25 GATGTGGCGCCCACGGACGCGGGCGGGCAACAATTCACCGCCATCGACCCCGCCGGCGTAGG 
GCATCACATCGGAATCCGGCTACCGGGCCGCTACAACGTCGCCAATTGCCTGGTCGCCCTGGC 
GATTCTGGACACCGTCGGGGTCTCCCCGGAACAGGCGGTGCCGGGCCTGCGTGAGATCCGGG 
TCCCGGGGCGGCTCGAGCAGATCGACCGCGGCCAGGGCTTTCTCGCGCTGGTCGACTACGCG 
CACAAACCGGAAGCGCTGCGGTCGGTGCTGACCACCTTGGCGCACCCGGACCGCGGGCTGGC 

30 GGTGGTGTTCGGCGCCGGCGGCGATCGTGACCCGGGCAAGCGGGCCCCGATGGGCCGGATA 
GCCGCGCAGCTGGCCGACTTGGTGGTCGTCACCGACGACAACCCGCGTGACGAAGATCCCAC 
GGCGATCCGCCGCGAAATCCTGGCTGGGGCGGCCGAAGTCGGCGGTGATGCCCAGGTCGTCG 
AGATCGCAGACCGGCGGGACGCGATCCGGCACGCGGTTGCCTGGGCGCGCCCCGGCGACGT 
GGTGCTCATCGCCGGCAAAGGCCACGAGACCGGGCAACGCGGCGGCGGGCGGGTCCGCCCG 

35 TTCGACGACCGGGTGGAGCTGGCTGCCGCGCTAGAGGCCCTCGAGCGGCGCGCATGA 

>Rv2169c - TB.seq 2420632:2421663 MW:36377 
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>emb|AL123456|MTBH37RV:e2421663-2420629, Rv2159c SEQ ID NO:83 
ATGAAATTTGTCAACCATATTGAGCCCGTCGCGCCCCGCCGAGCCGGCGGCGCGGTCGCCGAG 
GTCTATGCCGAGGCCCGCCGCGAGTTCGGCCGGCTGCCCGAGCCGCTCGCCATGCTGTCCCC 
GGACGAGGGACTGCTCACCGCCGGCTGGGCGACGTTGCGCGAGACACTGCTGGTGGGCCAGG 
5 TGCCGCGTGGCCGCAAGGAAGCCGTCGCCGCCGCCGTCGCGGCCAGCCTGCGCTGCCCCTGG 
TGCGTCGACGCACACACCACCATGCTGTACGCGGCAGGCCAAACCGACACCGCCGCGGCGAT 
CTTGGCCGGCACAGCACCTGCCGCCGGTGACCCGAACGCGCCGTATGTGGCGTGGGCGGCAG 
GAACCGGGACACCGGCGGGAGCGCCGGCACCGTTCGGCCCGGATGTCGCCGCCGAATACCTG 
GGCACCGCGGTGCAATTCCACTTCATCGCACGCCTGGTCCTGGTGCTGCTGGACGAAACCTTC 

10 CTGCCGGGGGGCCCGCGCGCCCAACAGCTCATGCGCCGCGCCGGTGGACTGGTGTTCGCCCG 
CAAGGTGCGCGCGGAGCATCGGCCGGGCCGCTCCACCCGCCGGCTCGAGCCGCGAACGCTG 
CCCGACGATCTGGCATGGGCAACACCGTCCGAGCCCATAGCAACCGCGTTCGCCGCGCTCAGC 
CACCACCTGGACACCGCGCCGCACCTGCCGCCACCGACTCGTCAGGTGGTCAGGCGGGTCGT 
GGGGTCGTGGCACGGCGAGCCAATGCCGATGAGCAGTCGCTGGACGAACGAGCACACCGCCG 

15 AGCTGCCCGCCGACCTGCACGCGCCCACCCGTCTTGCCCTGCTGACCGGCCTGGCCCCGCAT 
CAGGTGACCGACGACGACGTCGCCGCGGCCCGATCCCTGCTCGACACCGATGCGGCGCTGGT 
TGGCGCGCTGGCCTGGGCCGCCTTCACCGCCGCGCGGCGCATCGGCACCTGGATCGGCGCCG 
CGGGCGAGGGCCAGGTGTCGCGGCAAAACCCGACTGGGTGA 

20 >Rv2163cpbpBTB.seq 2425049:2427085 MW:72506 

>emb|AL123456|MTBH37RV:c2427085-2425046, pbpB SEQ ID NO:84 

GTGAGCCGCGCCGCCCCCAGGCGGGCCAGTCAGTCGCAGTCGACGCGACCGGCGCGCGGTTT 
GCGCCGGCCACCGGGAGCCCAGGAGGTTGGGCAACGCAAACGGCCCGGCAAAACGCAGAAAG 
CCCGGCAAGCCCAGGAAGCCACGAAATCCCGGCCTGCGACACGGTCAGACGTCGCACCCGCG 

25 GGTCGCTCGACTCGTGCGAGGCGCACCCGGCAGGTGGTGGACGTCGGGACGCGCGGTGCGTC 
GTTCGTCTTTCGGCATCGGACCGGAAACGCGGTCATCTTGGTGTTGATGTTGGTCGCGGCAACA 
CAATTGTTCTTTCTGCAGGTATCACATGCCGCGGGCCTGCGTGCGCAGGCGGCCGGCCAACTC 
AAGGTCACCGACGTCCAGCCAGCGGCTCGCGGCAGCATCGTCGACCGCAACAATGACCGGCTC 
GCGTTCACCATCGAGGCGCGTGGCCTGACGTTCCAGCCGAAGCGGATTCGGCGGCAATTGGAA 

30 GAGGCCAGGAAGAAGACGTCGGCTGCACCCGACCCGCAGCAGCGCCTGCGCGATATCGCCCA 
GGAGGTCGCCGGCAAGCTGAACAACAAGCCAGATGCCGCGGCCGTGCTGAAGAAGCTGCAAA 
GCGACGAGACCTTCGTCTACTTGGCGCGTGCGGTCGACCCGGCTGTCGCCAGCGCGATCTGCG 
CGAAGTATCCCGAGGTCGGTGCGGAAAGACAGGATCTGCGTCAGTACCCGGGTGGGTCGCTG 
GCGGCAAACGTCGTCGGTGGCATCGACTGGGATGGTCATGGGCTGCTGGGTCTGGAGGACTCC 

35 CTGGATGCGGTGCTGGCCGGAACCGACGGATCGGTCACCTACGACCGTGGGTCAGACGGCGT 
CGTCATCCCCGGCAGCTACCGGAATCGGCACAAGGCGGTCCACGGTTCCACCGTCGTGCTCAC 
CCTCGACAACGACATGCAGTTCTACGTGCAGCAGCAGGTGCAGCAGGCCAAGAACCTATCGGG 
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GGCTCACAACGTCTCGGCCGTCGTCCTGGACGCCAAQACC6GCGAGGTGCTCGCGATGGCCA 
ACGACAACACCTTCGACCCGTCGCAAGACATCGGGCGCCAGGGCGACAAGCAGTTGGGCAACC 
CGGCGGTGTCGTCGCCCTTCGAGCCGGGCTCGGTGAACAAGATCGTCGCCGCGTCCGCGGTC 
ATCGAGCACGGGTTGAGCAGCCCCGACGAGGTGCTACAGGTGCCTGGCTCGATCCAGATGGG 
5 CGGTGTTACCGTGCATGACGCTTGGGAGCACGGCGTGATGCCCTATACCACCACGGGGGTGTT 
CGGAAAGTCCTCCAACGTCGGCACGCTGATGCTTTCCCAACGTGTCGGACCGGAACGCTATTAC 
GATATGCTCCGCAAGTTCGGGTTGGGACAGCGCACCGGCGTGGGCCTGCCCGGTGAGAGCGC 
CGGACTGGTGCCGCCAATCGACCAGTGGTCGGGCAGTACGTTCGCTAATCTTCCTATTGGCCAA 
GGTCTTTCGATGACTTTGCTGCAGATGACCGGCATGTACCAGGCCATCGCCAACGATGGAGTGC 

10 GGGTACCCCCACGCATTATCAAGGCJCACCGTCGCACCCGACGGCAGCCGAACCGAAGAACCGC 
GCCCCGACGACATTCGCGTGGTGTCGGCGCAGACCGCCCAGACC6TGCGGCAGATGCTGCGT 
GCCGTGGTGCAACGCGATCCGATGGGCTACCAGCAGGGTACCGGGCCGACGGCCGGGGTGCC 
CGGCTATCAGAT6GCCGGCAAGACCGGTACCGCGCAGCAGATCAACCCTGGCTGCGGCTGCTA 
CTTCGACGACGTGTATTGGATCACCTTCGCCGGAATCGCCACTGCCGACAATCCCCGCTACGTG 

15 ATCGGCATCATGTTGGACAACCCGGCGCGCAACTCCGACGGCGCGCCTGGGCACTCGGCCGC 
CCCGCTGTTCCACAACATCGCGGGCTGGCTGATGCAGCGCGAAAACGTCCCGCTGTCACCCGA 
TCCCGGGCCTCCTTTGGTCTTGCAGGCCACCTAG 

>Rv2165c - TB.seq 2428236:2429423 MW:42498 

20 >emb|AL123456|MTBH37RV:c2429423-2428233. Rv21 65c SEQ ID NO:85 

GTGCAAACCCGTGCACCGTGGTCTCTGCCCGAAGCGACCCTGGCGTACTTCCCCAACGCCAGG 
TTCGTGTCTTCGGACAGGGACCTCGGTGCAGGG6CGGCGCCTGGAATAGCCGCGTCCCGAAGT 
ACGGCTTGCCAGACCTGGGGAGGTATCACGGTGGCTGATCCAGGTTCGGGGCCAACCGGTTTC 
GGTCATGTGCCGGTATTGGCGCAACGTTGCTTCGAACTGCTTACCCCCGCACTAACCCGCTACT 

25 ATCCAGACGGCTCGCAGGCGGTCCTTCTCGACGCGACCATCGGCGCGGGCGGGCATGCGGAG 
CGGTTTTTGGAGGGATTGCCGGGTCTGCGCCTGATCGGGCTCGACCGTGACCCAACCGCTCTG 
GACGTCGCGCGGTCTCGGCTGGTGCGA7TCGCTGACCGACTTACCCTGGTGCACACCCGCTAT 
GACTGTCTGGGCGCAGCGCTGGCTGAATCGGGTTATGCCGCAGTGGGATCAGTCGACGGAATC 
CTGTTCGATCTCGGCGTCTCATCCATGCAGCTCGACCGCGCCGAGCGGGGCTTCGCCTACGCC 

30 ACGGACGCGCCATTGGACATGCGGATGGACCCGACGACGCCGTTGACCGCAGCTGACATTGTC 
AACACTTACGACGAGGCGGCACTAGCCGACATCCTGCGTCGCTACGGAGAGGAGCGGTTTGCT 
CGGCGCATCGCTGCCGGTATCGTCCGCCGACGCGCAAAAACCCCGTTCACCTCGACCGCCGAA 
CTGGTTGCCCTGCTGTACCAGGCGATTCCAGCTCCGGCCCGGCGTGTCGGCGGGCATCCAGCC 
AAGCGAACATTCCAGGCGCTGCGCATCGCGGTCAACGATGAGCTGGAATCGCTGCGCACGGCC 

35 GTTCCTGCCGCGCTGGATGCCCTCGCTATCGGTGGGCGCATCGCGGTGCTGGCCTAGGA6TCG 
CTAGAGGACAGGATCGTCAAACGGGTGTTCGCCGAGGCAGTCGCGTCGGCCACCCCTGCGGG 
ACTTCCGGTCGAACTTCCCGGCCATGAGCCGCGATTCCGTTCGTTAACGCACGGCGCCGAACG 
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AGCGAGTGTGGCTGAGATCGAACGCAATCCCCQCAGTACTCCAGTGCGGTTGCGGGCCCTGCA 
ACGAGTCGAGCACCGGGCGCAATCGCAGCAATGGGCAACCGAGAAGGGTGATTCATGA 

>Rv2166c - TB.seq 2429428:2429856 MW:15912 
5 >emb|AL123456|MTBH37RV:c2429856-2429425.Rv2166c SEQ ID NO:86 

ATGTTTCTCGGCACCTACACGCCCAAACTCGACGACAAGGGGCGGCTGACGCTGCCGGCCAAG 
TTTCGCGACGCGTTGGCAGGGGGGTTGATGGTCACCAAGAGCCAAGATCACAGCCTGGCCGTT 
TACCCGCGGGCGGCGTTCGAGCAGCTGGCGCGCCGGGCCAGCAAGGCGCCACGAAGCAACC 
CCGAGGCGAGAGCGTTCCTACGTAATCTCGCCGCCGGTACCGACGAACAGCATCCCGACAGTC 
10 AAGGCCGGATCACCTTGTCGGCCGACCACCGCCGCTACGCAA6CCTTTCCAAGGACTGTGTGG 
TGATCGGCGCGGTCGACTATCTCGAGATCTGGGATGCGCAAGCCTGGCAGAACTACCAACAAAT 
CCATGAAGAGAACTTCTCCGCGGCCAGCGATGAAGCACTCGGTGACATCTTCTGA 

>Rv2197c - TB.seq 2461505:2462146 MW:22481 
15 >emb|AL123456|MTBH37RV:c2462146-2461502, Rv2197c SEQ ID NO:87 

ATGGTGAGCAGATATTCCGCATACCGGCGTGGGCCGGATGTAATCTCGCCGGACGTCATCGAT 
CGCATCCTGGTTGGGGCATGTGCCGCGGTGTGGCTGGTGTTCACCGGCGTGTCGGTGGCCGC 
CGCTGTCGCCCTGATGGACCTGGGTAGGGGCTTCCACGAGATGGCCGGAAAGCCGCACACCAC 
GTGGGTGCTGTACGCCGTAATTGTGGTCTCCGCACTGGTCATCGTGGGCGCGATACCGGTGCT 
20 GTTGCGAGCTCGCCGCATGGCTGAGGCCGAGCCCGCGACGAGGCCGACGGGTGCATCCGTGC 
GGGGCGGGCGATCGATCGGATCCGGGCATCCGGCGAAACGCGCTGTGGCCGAGTCGGCACCC 
GTACAGCACGCGGATGCATTCGAGGTGGCCGCCGAGTGGTCCAGTGAGGCGGTGGACCGGAT 
CTGGTTGCGCGGGACAGTCGTGTTGACCAGTGCGATTGGCATTGCGTTGATTGCCGTGGCGGC 
G6CGACCTACCTCATGGCGGTCGGTCACGACGGGCCATCTTGGATCAGCTACGGGTTGGCCGG 
25 GGTGGTCACCGCGGGCATGCCGGTGATCGAGTGGCTATACGCTCGGCAGCTGCGCCGGGTGG 
TGGCGCCCCAGTCCAGTTAG 

>Rv2198c - TB.seq 2462149:2463045 MW:30955 
>emb|AL123456|MTBH37RV:c2463045-2462146, mmpS3 SEQ ID NO:88 

30 ATGAGCGGGCCGAATCCCCCGGGACGGGAACCTGACGAACCCGAATCGGAACCCGTCAGCGA 
CACGGGCGACGAACGGGCTTCCGGCAACCACTTGCCGCCCGTCGCCGGGGGCGGCGACAAAC 
TGCCCAGTGACCAGACGGGCGAGACCGACGCATATTCTCGGGCATACTCTGCCCCGGAATCCG 
AGCACGTCACCGGCGGCCCGTATGTGCCAGCCGATCTCAGGCTCTATGACTACGACGACTATG 
AGGAGTCGTCCGACCTGGACGACGAACTGGCCGCTCCGCGCTGGCCGTGGGTGGTCGGTGTC 

35 GCCGCCATMTTGCCGCCGTTGCGCTCGTGGTTTCGGTGTCGTTGCTCGTCACGCGACCACATA 
CCAGCAAACTCGCCACCGGCGACACTACGTCCTCTGCACCGCCCGTGCAGGACGAAATCACGA 
CCACCAAGCCGGCGCCGCCACCGCCGCCACCAGCCCCACCGCCCACCACCGAGATCCCGACA 
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GCGACGGAGACACAGACGGTCACTGTGACGCCGCCACCACCGCCCCCACCGGCGACAACCAC 
GGCGCCGCCGCCGGCGACCACCACAACGGCGGCGGCACCGCCGCCCACGACCACCACGCCG 
ACCGGTCCGCGGCAAGTCACCTATTCGGTGACCGGTACCAAGGCGCCGGGTGACATTATCTCG 
GTGACTTACGTCGATGCCGCCGGGCGCCGACGGACACAGCACAATGTGTACATCCCGTGGTCC 
5 ATGACGGTCACCCCGATCTCGCAATCCGACGTTGGCTCGGTGGAGGCCTCCAGCCTTTTCCGG 
GTCAGCAAACTCAACTGCTCGATCACCACGAGCGACGGAACGGTGCTCTCATCGAACTCCAACG 
ATGGACCGCAAACGAGCTGCTGA 
>Rv2199c - TB.seq 2463234:2463650 MW:14866 

>emb|AL123466|MTBH37RV:c2463650-2463231 . Rv2199c SEQ ID NO:89 
1 0 ATGCATATCGAAGCCCGACTGTTTGAGTTTGTCGCCGCGTTCTTCGTGGTGACGGCGGTGCTGT 
ACGGCGTGTTGACCTCGATGTTCGCCACCGGTGGTGTCGAGTGGGCTGGCACCACTGCGCTGG 
CGCTTACCGGCGGCATGGCGTTGATCGTCGCCACCTTCTTCCGGTTTGTGGCCCGCCGGTTAG 
ATTCCCGGCCCGAGGACTACGAAGGCGCTGAAATCAGCGACGGCGCAGGAGAACTTGGATTCT 
TCAGTCCGCATAGCTGGTGGCCGATCATGGTCGCGTTGTCCGGCTCGGTGGCAGCGGTCGGCA 
15 TCGCGTTGTGGCTCCCGTGGCTGATCGCCGCCGGTGTGGCATTCATCCTCGCCTCGGCGGCCG 
GATTGGTCTTCGAATATTACGTCGGTCCTGAGAAGCACTGA 

>Rv2200c ctaC TB.seq 2463661:2464749 MW:40449 
>emb|AU23456|MTBH37RV;c2464749-2463658 p ctaC SEQ ID NO:90 

20 GTGACACCTCGCGGGCCAGGTCGTTTGCAACGCTTGTCGCAGTGCAGGCCTCAGCGCGGCTCC 
GGAGGGCCTGCCCGTGGTCTTCGACAGCTGGCGCTCGCAGCAATGCTGGGGGCATTGGCCGT 
CACCGTCAGTGGATGCAGCTGGTCGGAAGCCCTGGGCATCGGTTGGCCGGAGGGCATTACCC 
CGGAGGCACACCTCAATCGAGAACTGTGGATCGGGGCGGTGATCGCCTCCCTGGCGGTTGGG 
GTMTCGTGTGG6GTCTCATCTTCTGGTCCGCGGTATTTCACCGGAAGAAGAACACCGACACTG 

25 AGTTGCCCCGCCAGTTCGGCTACAACATGCCGCTAGAGCTGGTTCTCACCGTCATACCGTTCCT 
CATCATCTCGGTGCTGTTTTATTTCACCGTCGTGGTGCAGGAGAAGATGCTGCAGATAGCCAAG 
GATCCCGAGGTCGTGATTGATATCACGTCTTTCCAGTGGMTTGGAAGTTTGGCTATCAAAGGGT 
GAACTTCAAAGACGGCACACTGACCTATGATGGTGCCGATCCGGAGCGCAAGCGCGCCATGGT 
TTCCAAGCCAGAGGGCAAGGACAAGTACGGCGAAGAGCTGGTCGGGCCGGTGCGCGGGCTCA 

30 ACACCGAGGACCGGACCTACCTGAATTTCGACAAGGTCGAGACGTTGGGCACCAGCACCGAAA 
TTCCGGTGCTGGTGGTGCCGTCCGGCAAGCGTATCGAATTCCAAATGGCCTCAGCCGATGTGAT 
ACACGCATTCTGGGTGCCGGAGTTCTTGTTCAAGCGTGACGTGATGCCTAACCCGGTGGCAAAC 
AACTCGGTCAACGTCTTCCAGATCGAAGAAATCACCAAGACCGGAGCATTCGTGGGCCACTGCG 
CCGAGATGTGTGGCACGTATCACTCGATGATGAACTTCGAGGTCCGCGTCGTGACCCCCAACG 

36 ATTTCAAGGCCTACCTGCAGCAACGCATCGACGGGAAGACAAACGCCGAGGGCCTGCGGGCGA 
TCAACCAGCCGCCCCTTGCGGTGACCACCCACCCGTTTGATACTCGCCGCGGTGAATTGGCCC 
CGCAGCCCGTAGGTTAG 
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>Rv2427c proA g-glutamyl phosphate reductase TB.seq 2724231 :2725475 MW:43746 
>emb|AL123456|MTBH37RV:c2725475-2724228, proA SEQ ID NO:91 

ATGACCGTGCCAGCACCGTCGCAGCTCGACTTGCGTCAAGAGGTGCACGACGCCGCACGCCG 

5 CGCCCGGGTGGCCGCCCGCCGGCTGGCATCGCTGCCGACGACTGTCAAAGACCGCGCGCTGC 
ACGCGGCTGCCGACGAGCTACTGGCTCACCGCGACCAGATCCTGGCGGCCAACGCCGAAGAC 
CTGAACGCGGCGCGGGAGGCGGACACCCCGGCCGCCATGCTGGACCGGTTGTCCTTGAACCC 
GCAACGAGTCGACGGTATCGCCGCCGGGTTGCGGCAAGTCGCGGGACTGCGCGATCCGGTCG 
GTGAAGTGCTGCGTGGCTATAGCCTGCCCAACGGGCTGCAGCTGCGCCAGCAGCGCGTCCCCC 

10 TGGGCGTGGTCGGCATGATCTACGAGGGCCGCCCCAATGTCACCGTGGATGCCTTCGGGCTGA 
CACTCAAGTCGGGTAACGCTGCATTGCTGCGCGGCAGCTCGTCGGCCGCAAAGTCCAACGAGG 
CCCTGGTGGCGGTGTTACGCACCGCGCTGGTCGGCCTGGAGCTGCCGGCCGACGCGGTCCAG 
CTGCTGTCGGCTGCCGACCGCGCCACCGTCACTCACCTGATTCAGGCCCGCGGCCTGGTCGAT 
GTGGTGATTCCACGCGGGGGAGCGGGCCTGATCGAGGCGGTCGTACGCGATGCCCAGGTGCC 

15 CACCATCGAGACCGGCGTCGGGAACTGCCATGTCTACGTGCACCAAGCGGCCGACCTGGACGT 
GGCCGAGCGTATCTTGCTGAACTCCAAGACGCGGCGGCCCAGCGTCTGCAACGCCGCCGAGA 
CGCTGCTGGTCGACGCAGCGATCGCCGAAACGGCGTTGCCTCGATTGCTGGCCGCCCTGCAGC 
ACGCCGGTGTCACCGTACATCTCGACCCGGACGAGGCCGACCTGCGCCGCGAATACCTGTCGC 
TGGACATCGCGGTGGCGGTGGTCGACGGTGTCGACGCTGCCATCGCCCATATCAACGAATACG 

20 GCACCGGGCACACAGAAGCGATTGTGACCACCAATCTTGATGCGGCCCAACGCTTTACCGAACA 
GATCGATGCGGCCGCGGTGATGGTGAACGCATCAACGGCGTTCACCGACGGCGAGCAATTCGG 
CTTCGGCGCCGAGATCGGCATCTCCACCCAGAAACTGCATGCCCGCGGACCGATGGGACTACC 
GGAATTGACGTCGACCAAGTGGATCGCATGGGGAGCCGGCCACACCCGTCCGGCCTGA 

25 >Rv2438c - similar to YHN4_YEAST P38795 TB.seq 2734793:2737006 MW:80492 
>emb|AL123466|MTBH37RV:c2737006-2734790, Rv2438c SEQ ID NO:92 

ATGGGACTGCTCGGCGGCCAATCAGGGCCCAGGGTCGGCAGCGGCCCAGTCGGTAGCATCCC 
CACGCCGGTCAATGCCGCCATCTGCCAGCAGCGCGGGGGATTCCACGGTGTCGAGCGTGGAT 
ACTCGGCGGGTGATTCGGGCGTTCTGACGTCGCTGGGCGACAATGAAAGGACGATGAACTTTT 

30 ACTCCGGCTACCAGCACGGGTTCGTGCGCGTTGCCGCCTGCACTCACCACACCACCATCGGTG 
ACCCGGCGGCCAACGCCGCGTCGGTATTGGACATGGCCCGTGCGTGCCACGACGATGGCGCA 
GCGTTGGCGGTCTTTCCTGAGCTGACGCTGTCGGGCTACTCCATCGAGGACGTACTACTGCAG 
GACTCTCTGCTCGATGCCGTCGAGGACGCGCTGCTCGACCTGGTGACCGAATCeGCCGACCTG 
TTACCTGTACTGGTGGTCGGGGCTCCGCTGCGGCATCGACACCGCATCTACAACACCGCGGTC 

35 GTCATTCACCGCGGCGCCGTGCTCGGCGTGGTGCCCAAGTCGTATCTACCCACCTATCGCGAG 
TTCTACGAGCGGCGCCAGATGGCGCCCGGAGACGGGGAGCGGGGCACGATCCGCATCGGTGG 
CGCCGACGTGGCCTTCGGCACGGACCTGTTGTTCGCCGCGTCAGATCTACCCGGCTTTGTGTT 
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GCATGT6GAGATCTGCGAGGACATGTTTGTGCCGATGCCGCCCAGCGCCGAGGCGGCCCT6G 
CGGGCGCGACGGTGCTGGCGAATCTGTCCGGCAGCCCGATCACCATCGGCCGTGCCGAGGAC 
CGCCGGCTGCTTGCGCGCTCGGCGTCGGCGCGGTGTCTGGCTGCCTATGTCTATGCCGCCGC 
GGGGGAGGGGGAGTCAACGACGGACCTGGCCTGGGACGGTCAGACGATGATCTGGGAGAATG 

5 GCGCACTGCTCGCGGAGTCCGMCGTTTCCCCAAAGGAGTGCGCCGCAGTGTCGCCGACGTTG 
ACACCGAGTTGCTTCGGTCGGAGCGGCTGCGGATGGGCACGTTCGACGACAACCGGCGTCAC 
CACCGGGAGTTAACGGAATCGTTCCX3GCGCATCGACTTCGCACTCGACCCACCGGCAGGCGAC 
ATCGGACTGCTGCGCGAGGTCGAGCGGTTCCCGTTCGTTCCGGCCGATCCGCAACGATTGCAA 
CAGGATTGCTACGAGGCCTACAACATCCAGGTGTCTGGACTCGAGCAACGGTTGCGGGCGCTG 

10 GAGTATCCGAAGGTCGTTATCGGTGTGTCCGGGGGATTGGACTCGACGCACGCGCTGATCGTC 
GCGACCCATGCCATGGACCGCGAGGGCCGGCCGC6CAGCGACATTCTGGCGTTTGCGTTGCC 
CGGATTCGCCACCGGGGAGCACACTAAGAACAACGCGATCAAGCTGGCACGTGCGCTGGGGG 
TTACCTTCTCCGAAATCGATATCGGCGACACCGCTCGGTTGATGCTGCACACAATCGGCCATCC 
GTATTCGGTTGGCGAAAAAGTGTACGACGTCACCTTCGAGAACGTCCAGGCCGGGTTGCGCAC 

1S CGACTATCTTTTCCGTATCGCCAAGCAGCGCGGGGGAATCGTACTGGGCACCGGGGACCTGTC 
GGAGGTGGCACTGGGTTGGTCGACATACGGTGTCGGCGACCAGATGTCGCACTACAACGTCAA 
CGCCGGTGTGCCCAAGACGCTGATCCAGCACCTGATCCGGTGGGTCATTTCGGCGGGTGAGTT 
CGGTGAGAAGGTGGGTGAGGTATTGCAGTCGGTGCTCGACACCGAGATCACCCCCGAACTCAT 
TCCGACCGGCGAGGAGGAGCTGCAGAGCAGCGAGGCCAAGGTCGGACCTTTCGCCCTACAGG 

20 ACTTTTCGCTTTTTCAGGTAGTGCGCTACGGATTTCGCCCGTCGAAGATTGCGTTTTTGGCCTGG 
CATGCGTGGAACGATGCGGAGCGGGGCAACTGGCCGCCCGGCTTCCCAAAGAGCGAACGCCC 
GTCCTATrCATTGGCCGAAATCCGGCATTGGCTGCAGATTTTGGTCCAGCGGTTTTATTCGTtTA 
GCCAGTTCAAGCGTTCGGCATTGCCCAACGGCCCCAAGGTGTCCCACGGGGGCGCGTTGTCGC 
CGCGTGGGGATTGGCGGGCCCCGTCGGATATGTCAGCGCGAATCTGGCTCGATCAGATCGACC 

25 GTGAGGTGCCCAAGGGCTAG 

>Rv2439c proB glutamate 5-klnase TB.seq 27371 18:2738245 MW:38789 
>emb|AL123456|MTBH37RV:c2738245-2737115. proB SEQ ID NO:93 

ATGAGAAGTCCGCATCGGGACGCAATCCGGACCGCGCGCGGCCTTGTCGTGAAGGTCGGGAC 
30 CACG6CGCTTAGCACACCGTCCGGGATGTTCGATGCCGGCCGGCTGGCCGGACTGGCCGAGG 
CGGTCGAGCGGCGGATGAAGGCGGGTTCCGACGTCGTCATCGTGTCTTCGGGCGCCATCGCC 
GCCGGCATCGAGCCGCTCGGGCTGTCCCGTCGTCCCAAAGATCTGGCGACCAAGCAGGCGGC 
GGCCAGCGTCGGGCAGGTCGCGCTGGTGAACTCGTGGAGCGCGGCGTTCGCCCGCTACGGCC 
GCACGGTGGGCCAGGTGCTGCTGACCGCGCACGACATTTCGATGCGGGTGCAGCACACCAAC 
35 GCCCAACGCACGGTGGATCGGCTGCGCGCGTTGCACGCGGTGGCGATTGTCAACGAGAACGA 
CACCGTGGCCACCAACGAGATCCGGTTCGGTGACAACGATCGGCTGTCTGCACTGGTGGCGCA 
CCTGGTCGGCGCCGACGCTTTGGTGCTGCTGTCGGACATC6ACGGCCTCTACGACTGCGACCC 
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GCGCAAAACCGCGGACGCGACGTTCATTCCGGAGGTGTCCGGGCCGGCGGATCTGGACGGTG 
TGGTCGCCGGCCGCAGTAGCCACCTGGGTACTGGCGGCATGGCGTCCAAGGTGGCGGCGGCG 
CTGTTGGCCGCCGACGCCGGGGTGCCGGTACTGCTGGCCCCCGCGGCCGACGCCGCGACCG 
CGCTCGCCGACGCGTCGGTGGGCACGGTGTTTGCGGCCCGGCCCGCGCGTCTGTCGGCCCGG 
5 CGGTTCTGGGTGCGTTATGCCGGCGAAGCAACCGGCGCACTGACTCTCGACGCCGGTGCGGTG 
CGCGCTGTGGTGCGACAACGCCGGTCACTGCTGGCGGCGGGTATCACCGCGGTGTCCGGCCG 
GTTTTGCGGCGGCGATGTGGTCGAACTGCGTGCACCCGACGCGGCCATGGTAGCCCGCGGGG 
TGGTTGCCTACGACGCGTCCGAGCTGGCCACCATGGTGGGCCGGTCCACCTCTGAGCTACCCG 
GCGAGCTGCGCCGCCCGGTGGTGCACGCCGACGATCTGGTCGCGGTGTCGGCGAAGCAAGCT 
10 AAGCAAGTTTAG 

>Rv2440c obg Obg GTP-binding protein TB.seq 2738248:2739684 MW:50430 
>emb|AL123456|MTBH37RV:c2739684-2738245, obg SEQ ID NO:94 

GTGCCTCGGTTTGTCGATCGGGTCGTCATCCACACCAGAGCGGGTTCGGGCGGTAACGGCTGC 

15 GCTTCGGTCCATCGCGAGAAATTCAAGCGGCTGGGCGGCCCCGATGGCGGAAATGGCGGCCG 
GGGCGGCAGCATCGTCTTCGTCGTCGATCCGCAAGTGCACACCCTGCTCGACTTCCATTTCCGC 
CCGCATCTCACCGCGGCTTCGGGCAAGCACGGGATGGGCAATAACCGCGACGGGGCCGCCGG 
CGCGGATTTGGAAGTGAAAGTTCCCGAAGGCACCGTGGTATTGGACGAGAACGGCCGGCTACT 
GGCCGACCTGGTCGGCGCGGGCACCCGCTTTGAAGCCGCCGCCGGAGGCCGTGGCGGTTTGG 

20 GCAACGCCGCGCTGGCTTCCCGCGTGCGTAAGGCCCCCGGTTTCGCACTCCTCGGCGAAAAGG 
GACAGTCCCGAGACCTCACCTTGGAACTCAAGACCGTCGCCGACGTCGGCCTGGTCGGGTTTC 
CGTCGGCCGGAAAATCCTCGCTGGTGTCGGCGATTTCGGCGGCCAAGCCGAAGATCGCCGACT 
ACCCGTTCACCACCCTGGTGCCCAACCTCGGTGTGGTCTCGGCTGGCGAGCACGCGTTCACCG 
TCGCCGACGTGCCGGGGTTGATCCCGGGCGCATCCCGGGGCCGTGGTCTGGGGCTGGACTTT 

25 CTGCGGCACATCGAGCGCTGCGCTGTACTGGTGCATGTGGTGGATTGCGCTACCGCCGAGCCG 
GGCCGCGACCCCATCTCGGACATCGACGCGCTGGAAACGGAACTCGCGTGCTACACGCCCAC 
GCTGCAAGGGGACGCGGCTCTGGGCGATCTCGCCGCACGGCCGCGTGCGGTGGTCCTCAACA 
AAATCGATGTGCCGGAGGCCCGCGAGCTCGCGGAGTTCGTCCGTGACGACATCGCCCAGCGC 
GGCTGGCCGGTGTTCTGCGTGTCGACCGCAACCCGGGAAAACCTGCAGCCGTTGATCTTTGGG 

30 CTGTCGCAGATGATCTCGGACTACAACGCTGCGCGGCCGGTGGCGGTGCCACGGCGGCCGGT 
GATrCGTCCGATTCCGGTGGACGACAGCGGTTTTACCGTCGAACCCGACGGGCATGGTGGGTT 
TGTCGTCAGCGGTGCCCGGCCCGAGCGTTGGATTGACCAGACCAACTTCGACAACGACGAGGC 
CGTCGGCTATCTCGCCGACCGGCTGGCGCGCCTGGGTGTCGAGGAGGAATTGCTGAGGCTGG 
GTGCGCGGTCAGGATGCGCGGTGACCATCGGCGAGATGACGTTCGATTGGGAGCCGCAAACG 

35 CCTGCGGGTGAGCCGGTCGCGATGTCCGGCCGGGGCACCGATCCGCGGCTGGACAGCAAGAA 
GCGGGTGGGCGCGGCCGAGCGAAAGGCCGCTCGGAGTCGGCGTCGCGAACACGGGGATGGC 
TGA 
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>Rv2441c rpmA SOS ribosomal protein L27 TB.seq 2739773:2740030 MW:8969 
>emb|AL123456|MTBH37RV:c2740030-2739770, rpmA SEQ ID NO:95 

ATGGCACACAAGAAGGGGGCTTCCAGCTCGCGCAACGGTCGCGATTCCGCCGCCCAGCGGCT 
5 GGGGGTTAAGCGGTACGGCGGCCAGGTCGTCAAGGCCGGCGAGATCCTGGTCCGCCAGCGCG 
GTACCAAATTCCATCCCGGCGTCAACGTCGGGCGTGGCGGCGATGACACCTTGTTCGCCAAGA 
CGGCCGGGGCGGTCGAGTTCGGCATCAAACGCGGACGTAAGACGGTGAGCATCGTCGGTTCG 
ACCACTGCCTGA 

10 >Rv2442c rplU SOS ribosomal protein L21 TB.seq 2740048:2740359 MW;1 1 152 
>emb|AL123456|MTBH37RV:c274035^2740045, rplU SEQ ID NO:96 

ATGATGGCGACCTACGCAATCGTCAAGACCGGCGGCAAGCAGTACAAAGTCGCTGTCGGAGAT 
GTGGTCAAGGTCGAAAAGCTGGAATCCGAGCAGGGGGAGAAGGTGTCCCTGCCGGTGGCTCT 
GGTTGTCGACGGCGCCACCGTCACCACCGATGCGAAGGCACTGGCCAAGGTCGCGGTGACCG 
15 GTGAGGTGCTCGGGCACACCAAGGGCCCCAAGATCCGTATCCACAAGTTCAAGAACAAGACTG 
GCTACCACAAACGGCAGGGACACCGTCAGCAGCTGACGGTCCTGAAGGTCACCGGCATCGCAT 
AA 

>Rv2448c valS valyl-tRNA synthase TB.seq 2747596:2750223 MW:97822 

20 >emb|AL123456|MTBH37RV:e2750223-2747593> valS SEQ ID NO:97 

ATGCTGCCCAAGTCGTGGGATCCGGCCGCGATGGAGAGCGCCATCTATCAGAAGTGGCTGGAC 
GCTGGCTACTTCACCGCGGACCCGACCAGCACCAAGCCGGCCTATTCGATCGTGCTGCCGCCG 
CCGAACGTGACCGGCAGCCTGCACATGGGCCACGCGCTGGAACACACCATGATGGACGCCTTG 
ACGCGGCGCAAGCGGATGCAGGGCTATGAGGTGCTCTGGCAGCCGGGCACCGACCATGCCGG 

25 GATCGCCACCCAGAGCGTGGTCGAGCAGCAGCTGGCGGTCGACGGCAAGACTAAAGAAGACCT 
CGGCCGCGAGCTGTTCGTGGACAAGGTGTGGGATTGGAAGCGAGAGTCTGGCGGTGCCATCG 
GCGGCCAGATGCGCCGACTCGGTGACGGGGTGGACTGGAGCCGCGACCGGTTCACCATGGAC 
GAAGGTCTGTCGCGGGCGGTGCGCACGATCTTCAAGCGGCTTTATGACGCCGGGCTGATCTAT 
CGGGCCGAGCGGCTGGTCAACTGGTCGCCGGTGCTGCAGACCGCGATCTCCGACGTCGAGGT 

30 CAACTACCGCGACGTCGAAGGCGAGCTGGTGTCGTTTAGGTACGGCTCGCTTGACGACTCGCA 
ACCCCACATCGTGGTCGCCACCACCCGGGTCGAGACGATGCTGGGCGATACCGCGATCGCCGT 
CCATCCCGATGACGAGCGCTACCGTCACCTGGTCGGCACCAGCCTGGCGCACCCATTCGTCGA 
CCGGGAGCTGGCCATTGTCGCCGACGAGCACGTGGACCCTGAATTCGGCACCGGCGCGGTCA 
AAGTCACACCCGCCCACGACCCCAACGACTTCGAAATCGGGGTGCGCCACCAGCTGCCGATGC 

35 CCTCGATCCTGGACACCAAGGGCCGGATCGTCGACACCGGAAGGCGATTCGACGGCATGGACC 
GCTTCGAGGCACGGGTCGCGGTGCGCCAAGCGCTCGCGGCCCAGGGCCGCGTGGTCGAAGAA 
AAGCGACCCTACCTGCACAGCGTCGGACACTCCGAACGCAGCGGCGAGCCGATCGAGCCGCG 
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GCTATCCCTGCAGTGGTGGGTCCGGGTGGAATCGCTGGCCAAAGCGGCCGGGGATGCGGTGC 

GCAACGGGGACACCGTGATTCACCCGGCCAGCATGGAACCCCGCTGGTTCTCCTGGGTCGACG 

ACATGCACGACTGGTGCATCTCGCGACAGCTCTGGTGGGGGCATCGGATCCCGATCTGGTACG 

GACCCGACGGCGAACAGGTGTGCGTCGGCCCGGACGAAACACCCCCGCAGGGCTGGGAACAG 

GATCCTGACGTGCTGGATACCTGGTTTTCGTCGGCGCTGTGGCCGTTTTCCACGCTGGGTTGGC 

CGGACAAGACGGCGGAGCTGGAAAAGTTCTATCCGACAAGCGTTCTGGTTACCGGCTATGACAT 

CTTGTTCTTTTGGGTGGCCAGAATGATGATGTTCGGCACCTTCGTCGGCGACGACGCCGCCATC 

ACCCTCGACGGCCGCCGGGGCCCGCAGGTGCCGTTCACCGACGTGTTTCTGCATGGGCTGATC 

CGCGACGAGTCTGGCCGCMGATGAGCAAGTCCAAGGGCAACGTCATCGACCCGCTGGATTGG 

GTGGAAATGTTCGGGGCCGATGCGCTGCGGTTCACGCTGGCCCGCGGGGCCAGTCCCGGTGG 

TGACTTGGCGGTGAGCGAGGATGCCGTGCGGGCGTCGCGCAATTTCGGGACCAAGCTGTTCAA 

CGCCACTCGGTACGCACTGCTCAATGGCGCCGCGCCAGCACCCCTGCCATCGCCGAACGAGCT 

GACCGACGCCGACCGCTGGATTCTCGGAAGGTTGGAAGAGGTTCGGGCCGAAGTTGATTCGGC 

CTTCGACGGATACGAGTTCAGCCGCGCTTGTGAGTCCCTGTATCACTTCGCCTGGGACGAATTC 

TGCGACTGGTACCTCGAACTGGCCAAAACGCAGCTTGCCCAGGGACTCACACACACCACCGCG 

GTGCTGGCCGCCGGGCTGGACACGGTGCTGCGCCTGCTGCACCCGGTGATTCCCTrCCTCAGC 

GAGGCGCTATGGCTGGCGCTGACCGGCAGGGAATCGCTGGTCAGCGCCGACTGGCCGGAGCC 

TTCCGGGATTAGCGTGGACCTTGTTGCCGCGCAACGGATTAAGGATATGCAGAAGTTGGTGACG 

GAAGTGCGGCGGTTCCGCAGCGATCAAGGTCTGGCCGACCGGCAGAAGGTTCCGGCCCGAAT 

GCACGGTGTGCGGGACTCGGATGTGAGCAACCAGGTGGCCGGCGTGACCTCGCTGGCGTGGC 

TCACCGAGCCGGGCCCGGATTTTGAGCCGTCGGTCTCGTTGGAGGTTCGGCTCGGCCCCGAGA 

TGAACGGCAGCGTCGTCGTCGAGCTCGACACCTCGGGCACCATCGACGTGGCCGCGGAGGGT 

CGCGGCCTGGAAAAGGAGTTGGCCGGCGCCCAAAAGGAGCTGGGGTCGACCGCCGCCAAGTT 

GGCCAAGGCGGACTTTCTGGCCAAAGCGCCCGACGCCGTCATTGCCAAGATCCGGGACCGCCA 

GCGCGTGGCGCAGCAGGAAACCGAGCGCATCACCACCCGGTTGGCTGCGCTGCAATGA 

>Rv2482c plsB2 TB.seq 2786915:2789281 MW:88284 >emb|AL123456|MTBH37RV:c2789281- 
2786912, p!sB2 SEQ ID NO:98 

gtgaccaaaccggcggccgatgccagcgcggtggttactgccgaggacacactggtgctggc 

ttccacggcgacgccggtcgagatggagctgatcatgggctggctggggcagcagcgtgcac 

gccatccggactcgaagttcgacatattgaagctgccagcgcgcaacgctgcggcggcggcgc 

tgacggcactggtggagcagctcgagcccggcttcgcatccagcccgcaatctggcgaggac 

cgttgtatcgtgccggttcgggtgatctggctgcctcccgccgatcgcagccgggggggcaag 

gtggccgcactggtcccgggtcgggatccctaccatcccagccagcgtcagcagcgtcggatc 

ctgcgtaccgatgcca6gcgcgcgcgggtggtggccggcgagtcggccaaggtgtccgaact 

gcgccagcagtggcgcgataccacggtggcagagcacaagcgcgatttcgcccagttcgtcag 

ccgccgagcgCtgttggcgctggcgcgcgccgaatatcggatccttggacggcaatacaaatc 

111 



WO 01/35317 



PCT/US00/31152 



TCCCCGGCTGGTGAAGCCGGAQATGTTGGCGTGCGCACGATTTCGTGCCGGCCTGGACCGGAT 

TCCGGGCGCCACGGTCGAAGATGCCGGGAAGATGCTCGACGAACTCTCCACCGGATGGAGCC 

AGGTGTCGGTAGACCTGGTTTCCGTCCTCGGCAGGCTGGCTAGCCGCGGCTTCGATCCGGAAT 

TCGACTACGACGAGTATCAGGTCGCGGCGATGCGCGCCGCACTGGAGGCTCATCCGGCGGTC 

CTGCTGTTCTCGCACCGGTCCTACATCGACGGCGTGGTGGTACCGGTGGCCATGCAGGACAAC 

CGGTTACCGCCGGTGCACATGTTCGGCGGCATCAACCTGTCGTTCGGTCTCATGGGACCCCTC 

ATGCGGCGCTCGGGGATGATCTTCATCCGGCGCAATATCGGCAACGACCCACTGTATAAGTACG 

TGCTCAAGGAGTACGTGGGCTACGTGGTCGAGAAGCGGTTCAACCTGAGCTGGTCCATCGAAG 

GCACCCGGTCGCGCACCGGAAAGATGTTGCCGCCCAAGCTCGGTTTGATGAGCTACGTGGCCG 

ATGCTTACCTGGACGGCCGCAGTGACGACATCCTGCTGCAGGGGGTTTCGATTTGCTTCGATCA 

GCTGCACGAGATCACCGAATACGCCGCCTACGCGCGTGGCGCGGAGAAGACGCCCGAAGGTT 

TGCGCTGGCTCTACAACTTCATCMGGCGCAGGGGGAACGCAACTTCGGCAAGATCTACGTTCG 

CTTCCCCGAAGCGGTCTCGATGCGCCAGTACCTCGGCGCACCGCACGGCGAGCTGACCCAGG 

ATCCGGCCGCGAAACGGCTTGCGTTGCAGAAGATGTCGTTCGAGGTGGCCTGGAGGATTTTGC 

AGGCGACGCCGGTGACCGCGACGGGTTTGGTGTCCGCACTGCTGCTCACCACCCGCGGCACC 

GCGTTGACGCTCGACCAGCTGCACCACACGTTGCAGGACTCACTGGACTATCTGGAACGCAAA 

CAATCGCCGGTTTCGACAAGCGCATTGCGACTGCGCTCGCGCGAAGGCGTCCGTGCGGCGGC 

GGACGCGTTGTCCAACGGCCACCCGGTCACTCGGGTCGACAGTGGCCGGGAGCCGGTATGGT 

ACATAGCGCCTGACGACGAGCACGCCGCGGCGTTCTACCGGAACTCGGTGATCCATGCGTTTTT 

GGAGACCTCGATCGTCGAGCTCGCGCTGGCCCATGCCAAGCACGCCGAAGGTGACCGCGTCG 

CCGCGTTCTGGGCCCAGGCGATGCGGTTGCGGGATCTGCTGAAGTTCGACTTCTATTTCGCGG 

ATTCCACGGCGTTTCGGGCCAACATCGCCCAAGAGATGGCCTGGCACCAAGACTGGGAGGATC 

ATCTTGGCGTCGGGGGCAATGAGATCGACGCGATGCTGTATGCCAAACGGCCGCTGATGTCGG 

ACGGGATGTTGCGGGTCTTCTTCGMGCCTATGAGATGGTTGCCGACGTGTTGCGCGATGCTCC 

GCCTGACATCGGTCCTGAGGAGTTGACGGAGCTGGCGCTCGGCCTCGGCCGTCAGTTTGTGGC 

ACAGGGCCGGGTCCGCAGCAGCGAACCGGTATCGACGCTGCTGTTCGCCACTGCACGCCAGG 

TCGCCGTCGATCAGGAGCTGATAGCGCCGGCGGCCGACCTCGCCGAACGTAGGGTCGCCTTC 

CGGCGGGAGTTACGAAACATTCTGCGGGATTTCGACTATGTGGAGCAGATCGCGCGCAACCAG 

TTCGTCGCCTGCGAGTTCAAAGCGCGTCAAGGACGCGACCGAATCTAA 

>Rv2509 - putative oxidoreductase TB.seq 2824676:2825479 MW:28014 
>emb|AL123456|MTBH37RV:2824676-2825482, Rv2509 SEQ ID NO:99 

ATGCCGATACCCGCGCCCAGCCCGGACGCAGGTGCCGTTGTCACCGGGGCTTCGCAGAACATC 

GGCGCGGCGCTGGCCACCGAACTGGCCGCACGCGGGCACCACCTGATCGTCACCGCACGACG 

CGAGGACGTGTTGACCGAGTTGGCTGCCCGGCTGGCCGACAAGTACCGCGTGACGGTCGAGG 

TGCGACCGGCCGATCTGGCCGATCCGCAAGAACGATCGAAACTGGCCGACGAGCTGGCTGCC 

CGGCCCATCTCGATCCTGTGGGCCAACGCGGGTACCGCGACATTCGGCCCGATCGCATCGCTC 
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GATCTTGCCGGCGAAAAGACGCAGGTGCAGTTGAATGCCGTGGCGGTGCACGACCTTACGTTG 
GCGGTGTTGCCGGGCATGATCGAGCGCMGGCCGGCGGCATCTTGATTTCTGGTTCGGCGGCC 
GGCMTTCACCGATTCCCTACAACGCCACCTATGCCGCGACCAAGGCCTTCGTGAACACCTTCA 
GCGAATCTCTGCGCGGTGAGCTACGCGGCTCCGGCGTGCACGTCACGGTGCTGGCCCCGGGC 
5 CCGGTTCGCACCGAGCTACCGGATGCCTCCGAAGCGTCACTGGTCGAGAAGCTGGTGCCGGAC 
TTCCTGTGGATCTCGACGGAGCACACCGCCCGGGTATCGCTGAATGCCTTGGAGCGCAACAAG 
ATGCGCGTCGTTCCGGGTCTGACGTCAAAGGCGATGTCGGTGGCCAGCCAATACGGTCCGCGC 
GCCATCGTGGCGCCMTCGTGGGTGCCTTTTACAAGAGGCTTGGGGGCAGCTAG 

10 >Rv2524c fas fatty acid synthase TB.seq 2840124:2849330 MW:326226 
>emb|AL123456|MTBH37RV:c2849330-2840121, fas SEQ ID NO:100 

GTGACGATCCACGAGCACGACCGGGTGTCCGCTGATCGCGGCGGGGACAGCCCGCATACCAC 
CCACGCTCTGGTCGATCGCCTCATGGCTGGTGAGCCCTACGCTGTCGCATTCGGTGGCCAGGG 
CAGCGCCTGGCTGGAAACCCTCGAAGAGCTGGTGTCGGCCACCGGGATAGAAACCGAGTTGGC 

15 GACGTTGGTCGGTGAGGCAGAGCTGTTGCTCGATCCGGTCACCGACGAGCTGATTGTGGTGCG 
CCCGATCGGTTTCGAGCCGCT6CAATGGGTACGCGCACTGGCGGCCGAGGACCCGGTTCCGT 
CCGACAAGCACCTGACGTCGGCCGCCGTGTCGGTGCCCGGCGTGTTGCTTACCCAGATCGCGG 
CGACCCGGGCGCTGGCCCGTCAAGGCATGGACCTCGTGGCCACCCGGCCGGTCGCCATGGCG 
GGGCATTCGCAAGGTGTGCTGGCGGTGGAAGCCCTCAAGGCTGGTGGGGCACGCGACGTCGA 

20 GCTGTTTGCCTTGGCCCAGTTGATCGGTGCCGCCGGAACGCTGGTGGCCCGCCGGCGCGGAA 
TTTCCGTCCTGGGCGATCGCCCGCCGATGGTATCGGTCACCAACGCCGACCCCGAGCGCATCG 
GCCGGTTGCTCGACGAGTTCGCCCAGGACGTGCGCACGGTGCTGCCACCGGTGTTGTCCATCC 
GCMCGGCCGGCGTGCCGTCGTCATCACCGGCACCCCCGAGCAGCTGTCGCGTTTCGAGCTTT 
ATTGCCGCCAGATCTCCGAGAAGGAAGAAGCCGACCGCAAGAACAAGGTCCGCGGCGGCGAC 

25 GTCTTCTCGCCGGTCTTCGAGCCGGTGCAGGTGGAGGTGGGCTTTCACACCCCGCGGCTATCC 
GACGGGATCGACATCGTCGCGGGCTGGGCCGAGAAGGCGGGCCTCGATGTCGCCTTGGCTCG 
GGAGCTGGCCGATGCCATCTTGATCAGAAAGGTCGACTGGGTCGACGAGATCACCCGTGTCCA 
CGCGGCCGGCGCCCGCTGGATCCTCGACCTGGGGCCGGGCGACATCCTGACCCGACTGACCG 
CACCGGTGATCCGCGGCCTGGGCATCGGCATCGTGCCGGCGGCTACCCGCGGTGGCCAGCGC 

30 AACCTGTTCACCGTCGGCGCCACCCCCGAGGTTGCCCGGGCCTGGTCGAGCTACGCACCGACC 
GTGGTTCGCCTCCCCGACGGCAGGGTCAAGCTCTCGACGAAGTTCACCCGGCTGAGCGGCCGC 
TCGCCGATCCTGCTCGCGGGCATGACCCCGACCACCGTGGACGCCAAGATCGTCGCCGCGGC 
GGCCAACGCCGGGCACTGGGCCGAGCTGGCCGGCGGCGGGCAGGTCACCGAAGAGATCTTC 
GGTAACCGCATCGAACAAATGGCCGGCCTGCTCGAGCCGGGCCGCACCTATCAGTTCAACGCG 

35 CTGTTCCTCGATCCCTACCTGTGGAAGCTTCAGGTGGGCGGCAAGCGGTTGGTGCAGAAGGCC 
CGCCAGTCCGGCGCCGCGATCGACGGCGTGGTGATCAGCGCCGGCATCCCAGACCTCGACGA 
GGCCGTCGAGCTGATCGACGAACTGGGCGACATCGGCATCAGCCACGTCGTGTTCAAACCCGG 
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GACCATCGAGCAGATCCGCTCGGTGATTCGCATCGCCACCGAGGTGCCCACCAAGCCGGTGAT 
CATGCACGTCGAGGGCGGGCGCGCCGGCGGGCACCATTCCTGGGAGGATCTCGACGACCTGC 
TGCTGGCTACCTACTCGGAGTTGCGCTCACGCGCCAACATCACGGTGTGCGTCGGCGGCGGCA 
TTGGCACCCCGAGMGGGCTGCGGMTATTTGTCCGGGCGCTGGGCGCAGGCCTACGGCTTCC 

5 CATTGATGCCGATCGACGGCATCCTGGTCGGCACCGCGGCGATGGCCACCAAGGAATCCACCA 
CGTCGCCATCGGTCAAGCGGATGCTCGTCGACACTCAGGGCACCGACCAATGGATCAGCGCCG 
GAAAAGCGCAGGGCGGCATGGCCTCCAGCCGCAGTCAGCTCGGTGCCGATATCCACGAGATC 
GACAACAGCGCATCCCGGTGCGGGCGGCTGCTCGACGAGGTGGCCGGTGACGCGGAGGCGG 
TCGCGGAGCGTCGCGACGAGATCATCGCGGCGATGGCCAAGACCGCCAAGCCCTACTTCGGC 

10 GACGTCGCCGACATGACCTACCTGCAGTGGCTGCGGCGCTACGTCGAACTGGCCATCGGGGAA 
GGCAACTCGACCGCCGACACCGCCTCGGTGGGCAGCCCGTGGCTGGCCGACACCTGGCGGGA 
CCGCTTCGAGCAGATGCTGCAGCGTGCCGAAGCCCGGTTGCACCCACAGGATTTCGGCCCGAT 
CCAGACGCTATTCACCGATGCTGGCCTGCTGGACAATCCGCAGCAGGCGATCGCCGCCCTGCT 
GGCGCGCTACCCCGACGCCGAGACCGTGCAGTTGCATCCCGCGGATGTGCCCTTTTTCGTGAC 

15 GTTGTGCAAGACGCTGGGCAAGCCGGTCAACTTCGTGCCGGTGATCGACCAGGACGTGCGGC 
GCTGGTGGCGCAGCGACTCGCTGTGGCAGGCCCACGACGCCCGCTACGACGCCGATGCGGTG 
TGCATCATTCCGGGCACCGCGTCGGTAGCCGGCATCACCCGGATGGATGAACCCGTCGGTGAG 
TTGCTGGAGCGTTTCGAGCAAGCCGCAATCGATGAAGTGCTCGGCGCCGGTGTCGAGCCGAAG 
GATGTCGCGTCGCGCCGGCTGGGCCGCGCCGACGTGGCCGGACCGTTGGCTGTCGTCCTCGA 

20 CGCACCCGATGTGCGCTGGGCCGGTCGCACCGTGACCAACCCGGTGCATCGGATCGCCGACC 
CGGCCGAATGGCAGGTGCACGATGGACCCGAAAACCCGCGCGCCACACACTCATCCACCGGC 
GCCGGGCTGCAGACGCACGGCGACGACGTCGCCTTGAGCGTGCCCGTCTCGGGCACCTGGGT 
CGACATCCGATTCACGTTGCCGGCGAACACCGTCGATGGCGGCACCCCGGTGATCGCCACCGA 
GGACGCCACCAGCGCCATGCGCACGGTGCTGGCGATCGCCGCCGGTGTCGACAGCCCGGAGT 

25 TCTTGCCTGCGGTGGCCAACGGGACGGCCACTTTGACGGTGGACTGGCACCCCGAGCGTGTTG 
CCGACCACACCGGCGTCACCGCCACGTTCGGTGAGCCGCTGGCACCCAGCCTCACCAACGTG 
CCCGACGCGCTCGTCGGCCCTTGTTGGCCAGCGGTTTTCGCGGCCATCGGATCGGCGGTCACC 
GACACCGGTGAGCCGGTGGTGGAAGGCCTGCTGAGCCTGGTGCATCTGGACCACGCCGCCCG 
CGTGGTCGGTCAGCTGCCCACGGTCCCGGCCCAATTGACCGTCACCGCAACGGCTGCCAACGC 

30 AACCGATACGGACATGGGCCGCGTCGTGCCGGTCTCGGTCGTCGTTACCGGCGCGGATGGCG 
CCGTGATCGCCACTCTCGAGGAGCGATTCGCGATCCTGGGTCGCACCGGTTCCGCCGAGCTCG 
CCGACCCGGCGCGAGCCGGTGGCGCGGTGTCGGCGAACGCCACCGACACCCCGCGCCGTCG 
CCGCCGCGACGTCACGATCACCGCGCCGGTCGACATGCGCCCGTTCGCGGTGGTGTCCGGCG 
ACCACAACCCCATTCACACCGACCGGGGCGCCGCGCTGCTTGCCGGCCTGGAGTCGCCGATC 

35 GTGCACGGCATGTGGCTGTCGGCCGGGGCGCAACACGCGGTGACCGCCACCGACGGGCAGG 
CCCGGCCACCGGCCCGGCTGGTCGGCTGGACCGCGCGGTTTTTGGGCATGGTGCGCCCCGGC 
GACGAGGTGGACTTCCGCGTCGAGCGCGTCGGAATCGACCAGGGCGCAGAGATTGTGGACGT 
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G6CCGCGCGCGTCGGGTCGGATCTAGTGATGTCGGCCTCCGCGCGACTQGCCGCACCCAAGA 
CGGTCTACGCATTCCCCGGCCAGGGCATCCAACACAAGGGCATGGGCATGGAGGTGCGCGCC 
CGCTCCAAGGCGGCCCGCAAGGTGTGGGACACCGCGGACAAGTTCACCCGCGACACCCTGGG 
CTTCTCGGTACTGCACGTGGTCCGCGACAACCCGACCAGCATCATCGCCAGCGGTGTGCACTA 

6 CCACCACCCCGACGGGGTGCTCTACCTGACGCAGTTCACCCAGGTCGCGATGGCGACGGTGG 
CGGCCGCGCAGGTCGCCGAGATGCGTGAACAGGGAGCCTTCGTCGAAGGCGCCATCGCGTGC 
GGCCACTCGGTCGGCGAGTACACCGCGCTGGCCTGCGTGACCGGCATCTACCAACTGGAAGC 
CTTGCTGGAGATGGTGTTTCACCGCGGGTCGAAGATGCACGACATCGTTCCGCGCGACGAGCT 
CGGCCGCTCCAACTATCGGCTGGCGGCCATCCGGCCGTCCCAGATCGACCTCGACGACGCCG 

10 ACGTGCCCGCGTTCGTCGCCGGGATCGCGGAGAGCACCGGTGAATTCCTGGAGATCGTGAATT 
TCAACCTGCGTGGCTCGCAATACGCGATCGCGGGCACGGTACGCGGCCTCGAGGCGCTCGAG 
GCCGAGGTGGAGCGGCGCCGCGAGCTCACCGGCGGCCGACGGTCGTTCATTTTGGTGCCCGG 
CATCGATGTTCCGTTCCACTCGCGAGTGCTGCGGGTCGGGGTGGCCGAATTCCGGCGCTCGCT 
GGACCGGGTCATGCCGCGCGACGCGGACCCCGACCTGATCATCGGGCGCTACATTCCCAACCT 

15 GGTGCCGCGGTTGTTCACCCTGGACCGCGACTTCATCCAGGAAATCCGGGATTTGGTGCCCGC 
CGAGCCGCTCGACGAGATCCTCGCCGACTACGACACCTGGCTTCGCGAGCGTCCGCGCGAGAT 
GGCGCGCACGGTGTTCATCGAGCTGCTGGCATGGCAATTCGCCAGCCCGGTGCGCTGGATCGA 
GACGCAGGATCTGCTGTTCATCGAGGAGGCCGCCGGCGGGCTGGGTGTGGAGCGATTCGTCG 
AGATCGGTGTGAAGAGCTCACCGACGGTGGGGGGTCTTGCCACCAACACCCTCAAACTGCCCG 

20 AATACGCCCACAGCACAGTGGAAGTGCTCAACGCCGAGCGTGATGCCGCGGTGCTGTTCGCCA 
CCGACACCGACCCGGAGCCGGAGCCGGAGGAAGACGAGCCGGTCGCGGAATCGCCCGCGCC 
GGACGTCGTCTC66AAGCCGCCCCCGTCGCGCCGGCCGCTTCGTCGGCGGGCCCGCGTCCCG 
ACGATCTGGTTTTCGACGCCGCCGATGCCACGCTGGCGCTGATCGCGCTCTCGGCCAAGATGC 
GCATCGACCAGATCGAAGAACTCGACTCCATCGAGTCCATCACCGACGGTGCGTCGTCGCGGC 

25 GCAACCAGCTGCTGGTGGACCTGGGCTCCGAGCTGAACCTCGGTGCCATTGACGGCGCCGCC 
GAATCGGACCTGGCCGGTCTGCGCTCACAGGTGACCAAACTGGCGCGCACCTACAAGCGTTAC 
GGCCCAGTGCTTTCCGACGCCATCAACGACCAGCTTCGCACCGTCGTCGGACCGTCGGGCAAG 
CGGCCCGGCGCCATCGCCGAGCGGGTGAAGAAGACCTGGGAGCTCGGTGAGGGCTGGGCCA 
AGCATGTCACCGTCGAGGTCGCGCTGGGCACCCGCGAGGGCAGCAGCGTTCGCGGCGGCGCC 

30 ATGGGCCACCTGCACGAGGGCGCGCTGGCCGATGCCGCCTCCGTCGACAAGGTCATCGACGC 
GGCGGTCGCATCGGTGGCCGCGCGCCAGGGCGTTTCGGTAGCGCTGCCGTCGGCCGGTAGTG 
GTGGCGGCGCCACCATCGACGCGGCCGCGCTCAGCGAGTTCACCGACCAAATCACCGGCCGT 
GAGGGCGTGCTGGCCTCCGCGGCCCGCCTGGTGCTGGGGCAGCTGGGACTGGACGACCCCGT 
CAACGCCTTGCCGGCCGCCCCCGATTCCGAGCTGATCGACTTGGTCACCGCCGAACTGGGAGC 

35 GGACTGGCCGCGGTTGGTGGCACCGGTGTTCGACCCCAAGAAGGCCGTCGTATTCGACGACC 
GCTGGGCCAGCGCCCGCGAGGACCTGGTGAAGCTGTGGCTGACCGACGAGGGCGACATCGAC 
GCCGACTGGCCGCGCCTGGCGGAGCGGTTCGAGGGTGCCGGCCACGTCGTGGCGACCCAGG 
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CTACCTGGTGGCAAGGTAAGTCGCTGGCCGCGGGCCGGCAGATCCATGCATCGCTGTACGGCC 
GCATCGCCGCCGGCGCCGAGAACCCCGAACCCGGCCGCTACGGCGGCGAAGTTGCCGTGGTG 
ACCGGCGCTTCGAAGGGTTCGATCGCCGCGTCGGTGGTGGCTCGGCTGCTCGACGGCGGAGC 
CACCGTCATCGCGACCACCTCCAAGCTCGACGAGGAGCGGCTGGCGTTCTACCGCACGCTGTA 
5 TCGCGACCACGCCCGTTACGGCGCGGCGCTGTGGCTGGTCGCGGCGAACATGGCGTCCTACT 
CCGACGTCGACGCCCTGGTCGAATGGATCGGCACCGAACAGACCGAAAGCCTTGGGCCGCAGT 
CGATTCACATCAMGACGCGCAGACCCCGACGCTGCTGTTCCCGTTCGCGGCGCCACGCGTGG 
TCGGGGACCTGTCGGAGGCCGGTTCGCGCGCCGAGATGGAGATGAAAGTGCTGCTGTGGGCC 
GTGCAACGGCTGATCGGCGGCCTGTCGACGATCGGCGCCGAACGCGACATCGCGTCGCGGCT 

10 GCACGTGGTGCTGCCCGGCTCGCCCAACCGTGGCATGTTCGGCGGCGACGGCGCCTACGGCG 
AAGCCAAGTCCGCGCTGGATGCCGTGGTGAGCCGCTGGCACGCCGAGTCGTCCTGGGCGGCA 
CGGGTCAGCCTGGCGCACGCGCTCATCGGCTGGACCCGCGGCACCGGGCTGATGGGCCACAA 
CGATGCCATCGTGGCCGCCGTCGAAGAGGCCGGGGTCACCACCTACTCGACCGACGAGATGG 
CGGCGCTGCTGCTCGACCTGTGTGATGCGGAATCCAAGGTGGCTGCGGCGCGTTCGCGGATCA 

15 AGGCCGACCTGACCGGGGGCCTGGCCGAGGCCAACCTCGACATGGCCGAGCTGGCGGCCAAG 
GCGCGCGAGCAGATGTCGGCAGCGGCGGCCGTCGACGAGGACGCCGAGGCCCCTGGCGCCA 
TCGCCGCGCTGCCGTCGCCGCCCCGGGGTTTCACCCCCGCACCGCCGCCGCAATGGGACGAC 
CTCGATGTCGACCCGGCCGACCTGGTGGTGATCGTCGGCGGCGCCGAAATCGGCCCGTACGG 
CTCGTCACGCACCCGGTTCGAGATGGAGGTCGAAAACGAGCTGTCGGCGGCCGGCGTGCTGG 

20 AGCTGGCCTGGACCACTGGGTTGATCCGCTGGGAGGACGACCCGCAACCCGGTTGGTACGACA 
CCGAATCCGGCGAAATGGTCGACGAATCCGAGTTGGTGCAGCGCTACCACGACGCCGTGGTGC 
AGCGCGTCGGCATTCGCGAATTCGTTGATGACGGCGCGATCGACCCCGACCACGCCTCGCCGC 
TGCTGGTGTCGGTGTTCCTGGAGAAGGACTTCGCGTTCGTGGTGTCCTCGGAGGCCGATGCGC 
GCGCCTTCGTCGAGTTCGATCCCGAGCACACGGTCATCCGGCCGGTGCCCGACTCCACCGACT 

25 GGCAGGTCATCCGCAAGGCCGGCACCGAGATCCGGGTGCCGCGAAAGACCAAGCTGTCCCGC 
GTCGTCGGCGGCCAGATCCCGACCGGGTTCGACCCGACGGTGTGGGGCATCAGCGCAGACAT 
GGCCGGTTCCATCGACCGGTTGGCGGTATGGAACATGGTGGCGACCGTCGACGCGTTCCTGTC 
GTCCGGTTTCAGCCCGGCCGAGGTGATGCGTTACGTGCACCCGAGTTTGGTGGCCAACACCCA 
GGGCACCGGCATGGGCGGCGGCACGTCGATGCAGACGATGTACCACGGCAATCTGTTGGGCC 

30 GCAACAAGCCGAACGACATCTTCCAGGAAGTCTTGCCGAATATCATTGCCGCGGACGTGGTTCA 
GTCCTACGTCGGTAGCTACGGTGCGATGATCCACCCGGTAGCCGCGTGCGCCACCGCCGCGGT 
GTCGGTCGAGGAAGGTGTCGACAAGATCCGGTTGGGCAAGGCTCAACTGGTGGTGGCCGGCG 
GCCTGGATGACCTGACGCTGGAGGGCATCATCGGATTCGGTGACATGGGGGCCACCGCCGACA 
GGTCCATGATGTGCGGCCGCGGCATCCACGACTCGMGTTTTCCCGGCCCAACGACCGCCGCC 

35 GTCTGGGCTTCGTCGAAGCCCAAGGCGGCGGGACGATCCTGTTGGCCCGCGGGGACCTGGCG 
CTGCGGATGGGGCTGCCGGTGCTGGCGGTGGTGGCGTTCGCGCAGTCGTTCGGCGACGGCGT 
GCACACCTCGATCCCGGCCCCGGGCCTGGGCGCGCTGGGGGCGGGCCGCGGCGGCAAGGAT 
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TCACCGCTGGCGCGGGCGCTGGCCAAGCTGGGCGTGGCCGCCGACGACGTGGCGGTCATCTC 

CAAGCACGACACCTCGACGCTGGCCAACGATCCCAACGAGACCGAGTTGCATGAACGGCTCGC 

CGACGCCCTGGGCCGTTCCGAGGGCGCCCCGCTGTTCGTGGTGTCGCAGAAGAGCCTGACCG 

GCCACGCCAAGGGCGGCGCGGCGGTCTTCCAGATGATGGGGCTCTGCCAGATATTGCGGGAT 

GGGGTGATCCCACCCAACCGCAGCCTCGACTGCGTCGACGACGAGCTGGCCGGGTCCGCGCA 

TTTCGTGTGGGTGCGTGAGACGTTGCGGCTCGGCGGCAAGTTCCCACTCAAGGCCGGCATGCT 

GACCAGCCTCGGGTTCGGCCATGTGTCGGGCCTGGTCGCGTTGGTGCATCCGCAGGCGTTCAT 

CGCCTCGCTGGATCCCGCACAGCGCGCGGACTACCAGCGGCGTGCCGACGCCCGCCTGCTGG 

CCGGTCAGCGCCGGCTGGCCTCGGCGATTGCCGGTGGTGCGCCGATGTACCAGCGGCCCGGT 

GACCGTCGCTTCGACCACCACGCGCCCGAGCGGCCGCAGGAGGCGTCGATGCTGCTGAATCC 

GGCGGCCCGGCTGGGTGACGGCGAGGCGTATATCGGCTGA 

>Rv2555c alaS alanyl-tRNA synthase TB.seq 2873772:2876483 MW:97326 
>emb|AL123456|MTBH37RV:c2876483-2873769, alaS SEQ ID NO:101 

GTGCAGACACACGAGATCAGGAAGCGGTTCCTCGATCATTTCGTGAAGGCGGGCCACACCGAG 

GTGCCCAGCGCCTCGGTGATCGTCGACGACCCCAACCTGTTGTTCGTCAACGCCGGGATGGTC 

CAGTTCGTGCCTTTCTTCTTGGGACAGCGCACGCCGCCGTACCCGACGGCCACCAGCATCCAG 

AAGTGCATCCGTACCCCCGATATCGACGAGGTGGGCATAACGAGGGGGCACAACACGTTTTTTC 

AGATGGCCGGCAATTTCAGCTTCGGCGACTATTTCAAACGCGGGGCCATTGAACTGGCCTGGG 

CACTGCTGACCAACAGCCTCGCCGCCGGCGGCTACGGCCTGGACCCGGAAAGAATCTGGACG 

ACAGTCTATTTCGACGACGACGAAGCTGTCCGGCTATGGCAGGAGGTTGCCGGGCTGCCGGCG 

GAGCGAATCCAGCGCCGCGGCATGGCCGACAACTACTGGTCGATGGGCATTCCCGGACCGTG 

CGGGCCGTCATCGGAGATCTATTACGACCGCGGACCCGAATTCGGTCCCGCAGGCGGTCCCAT 

CGTGAGCGAAGACCGCTACCTCGAGGTCTGGAACCTGGTGTTCATGCAGAACGAGCGCGGAGA 

GGGAACCACCAAGGAGGACTACCAGATCCTCGGGCCGCTGCCCCGCAAGAACATCGACACCG 

GCATGGGCGTCGAGCGGATCGCGCTGGTGGTGCAA6ACGTGCACAACGTCTACGAGACCGAC 

CTGCTCAGGCCGGTCATCGATACCGTGGCCAGGGTCGCCGCGCGTGCCTACGACGTCGGCAA 

CCACGAAGACGACGTGCGGTACCGCATCATCGCAGACCACAGGGGCACCGCCGCGATCCTGAT 

CGGTGACGGCGTCAGCCCCGGCAACGACGGTCGCGGTTATGTGCTGCGCCGGCTGCTGCGTC 

GGGTGATCCGCTCCGCCAAGCTGCTGGGCATCGACGCTGCGATCGTTGGCGACCTGATGGCCA 

CGGTGCGCAACGCGATGGGCCCGTCATATCCCGAACTCGTCGCCGACTTCGAGCGGATCAGCC 

GGATCGCGGTCGCCGAGGAGACGGCGTTCAACCGCACGCTGGCGTCGGGTTCCAGGCTGTTC 

GAGGAGGTGGCTAGCTCCACCAAGAAATCCGGAGCCACCGTGCTGTCCGGATCGGACGCTTTC 

ACGTTGCATGACACCTACGGGTTCCCGATCGAGCTCAGGCTGGAGATGGCGGCCGAAACCGGT 

GTGCAGGTAGACGAAATCGGGTTCCGTGAGCTGATGGCCGAGCAGCGCCGCCGTGCCAAGGC 

CGACGCCGCCGCGCGCAAACACGCGCATGCTGACCTGAGCGCCTACGGCGAGCTGGTTGACG 

CCGGCGCCACCGAGTTCACCGGATTCGACGAGTTGCGTTCCCAGGCGCGGATTCTGGGCATCT 
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TCGTCGACGGTAAGCGGGTTCCGGTGGTGGCGCACGGTGTAGCCGGCGGAGCCGGGGAAGG 
GCAGCGTGTCGAACTTGTCTTAGATCGCACCCCGCTCTACGCCGAATCGGGTGGGCAGATCGC 
CGATGAGGGCACCATCAGCGGAACCGGTTCCAGCGAAGCTGCCCGGGCCGCGGTTACCGACG 
TGCAGAAGATCGCCAAMCGCTTTGGGTGCACCGAGTCAACGTGGAATCCGGGGAATTCGTCG 

5 AGGGTGACACCGTAATCGCGGCGGTGGATCCCGGGTGGCGCCGGGGTGCCACGCAGGGCCA 
CTCGGGCACCGACATGGTGCATGCCGCGCTGCGACAAGTGCTGGGGCCCAACGCGGTTCAGG 
CGGGATCGCTGAACCGGCCGGGATATTTGCGCTTCGACTTTAACTGGCAGGGTCCGTTGACCG 
ACGACCAGCGCACCCAGGTCGAAGAGGTCACCAACGAGGCCGTGCAAGCGGACTTCGAGGTG 
CGCACGTTCACCGAACAGCTCGACAAGGCCAAGGCGATGGGTGCCATCGCGCTGTTCGGCGAG 

10 AGCTACCCCGACGAAGTGCGGGTGGTGGAGATGGGTGGACCGTTCTCGCTGGAGCTATGTGGC 
GGCACCCATGTGAGCAACACGGCGCAGATCGGTCCCGTGACGATCCTGGGCGAGTCGTCGATC 
GGCTCCGGGGTGCGCC6GGTGGAGGCCTACGTGGGGTTGGATTCGTTTCGTCACCTGGCCAA 
GGAGCGTGCGTTGATGGCCGGGTTGGCCTCGTCACTGAAGGTGCCGTCCGAAGAGGTACCGG 
CCCGGGTGGCCAATCTAGTGGAGCGCCTGCGGGCCGCCGAGAAGGAACTCGAACGTGTCCGG 

1 5 ATGGCCAGCGCCCGGGCAGCCGCCACCAATGCCGCCGCCGGGGCTCAGCGGATCGGTAACGT 
CCGTTTGGTGGCGCAGCGAATGTCCGGCGGGATGACCGCGGCAGACCTGCGGTCGTTGATCG 
GCGACATCCGCGGCAAGCTGGGTAGCGAGCCGGCGGTGGTGGCGCTGATTGCCGAGGGCGAA 
AGCCAAAGTGTGCCGTATGCGGTCGCGGCCAATCCCGCTGCCCAGGACCTCGGAATCCGTGCC 
AACGACCTGGTCAAACAACTTGCGGTGGCGGTCGAAGGCCGCGGTGGCGGTAAGGCGGACCT 

20 GGCGCAGGGCTCGGGAAAGAATCCGACCGGTATCGACGCCGCGCTCGACGCGGTCCGCTCCG 
AGATCGCCGTGATAGCGCGGGTCGGTTGA 

>Rv2580c hisS histidyl-tRNA synthase TB.seq 2904822:2906090 MW:451 18 
>emb|AL123466|MTBH37RV:c2906090-2904819 t hisS SEQ ID NO:102 

25 GTGACGGAATTCTCGTCATTTTCGGCCCCCAAGGGGGTACCGGACTACGTCCCGCCCGACTCG 
GCGCAGTTCGTCGCGGTGCGCGACGGGCTGCTCGCGGCGGCCCGTCAAGCCGGCTATAGCCA 
CATCGAGCTGCCCATCTTCGAGGACACCGCCCTGTTCGCCCGGGGCGTGGGTGAATCCACCGA 
CGTGGTGTCCAAGGAGATGTATACGTTCGGCGACCGTGGCGACCGCTCGGTGACGCTGCGGCC 
CGAGGGCACCGCCGGGGTGGTGCGTGCGGTGATCGAACACGGGCTGGATCGCGGCGCGCTG 

30 CCGGTGAAGTTGTGTTATGCGGGCCCGTTTTTCCGCTACGAGCGTCCGCAGGCCGGCCGGTAT 
CGCCAGTTACA6CAAGTCGGGGTGGAGGCGATCGGCGTCGACGACCCGGCGTTGGACGCCGA 
GGTGATCGCCATTGCCGACGCCGGGTTCCGCTCGTTGGGTCTCGACG6GTTCCGGCTGGAAAT 
CACCTCCCTGGGAGACGAGAGTTGCCGTCCGCAGTACCGGGAACTGTTGGAGGAGTTCTTGTTT 
GGACTCGATCTCGACGAGGACACCCGCAGGCGCGCAGGGATCAATCCGCTGCGGGTGCTCGA 

35 CGACAAGCGACCCGAATTGCGTGCGATGACGGCGTCGGCGCCGGTGTTGCTGGATCATCTGTC 
TGATGTCGCCAAGCAGCATTTCGACACCGTGCTCGCCCATCTGGACGCGCTTGGAGTGCCCTAT 
GTCATCAACCCGCGCATGGTGCGCGGCGTGGACTACTACACCAAGACCGCCTTCGAGTTGGTC 
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CATGACGGGCTTGGTGCGCAATCGGGGATCGGCGGCGGGGGGCGCTACGACGGCGTGATGCA 
CCAGCTTGGCGGGCAGGACTTGTCGGGCATCGGGTTCGGGCTGGGCGTGGACCGGACCGTGC 
TGGCGCTGGGGGCCGAGGGCAAGACGGCGGGGGACAGCGCCCGGTGCGACGTGTTCGGCGT 
GCCGCTTGGCGAGGCGGCCAAGCTCAGGCTGGCGGTGCTGGCTGGACGACTGCGCGCGGCC 
5 GGGGTGCGGGTTGACCTTGCCTATGGTGATCGCGGGCTCAAAGGCGCGATGCGCGCGGCCGC 
TCGTTCCGGCGCCCGTGTTGCGTTGGTAGCGGGCGACCGCGACATCGAGGCCGGGACGGTCG 
CAGTGAAGGACTTGACGACGGGTGAGCMGTTTCGGTCTCGATGGATTCGGTTGTGGCCGAAG 
TAATTTCGCGGCTGGCTGGGTAG 

10 >Rv2614c thrS threonyl-tRNA synthase TB.seq 2941 190:2943265 MW:77123 
>emb|AL123456|MTBH37RV:c2943265-2941187 f thrS SEQ ID NO:103 

ATGAGCGCCCCCGCACAACCCGCCCCGGGAGTCGATGGCGGCGACCCGTCGCAAGCCCGAAT 
TCGGGTTCCTGCCGGGACCACCGCGGCCACCGCCGTCGGCGAAGCGGGTTTACCGCGGCGCG 
GTACGCCCGATGCGATCGTCGTCGTGCGCGACGCCGACGGCAACCTGCGCGACCTGAGCTGG 

15 GTGCCCGACGTCGACACCGATATCACGCCGGTGGCCGCCAACACCGACGACGGTCGCAGCGT 
GATCCGCCATTCGACCGCGCACGTGTTGGCCCAAGCCGTCCAAGAGGTGTTTCGGCAGGCCAA 
GCTCGGCATCGGACCACCCATCACCGACGGCTTCTACTACGACTTCGACGTGCCCGAGCCGTT 
CACGCCCGAGGACTTGGCGGCGCTGGAAAAGCGGATGCGCCAGATCGTCAAGGAAGGCCAGC 
TGTTCGACCGGCGGGTCTACGAATCCACCGAACAGGCCCGCGCCGAGCTGGCCAACGAGCCC 

20 TACAAGCTGGAACTCGTCGACGACAAATCGGGTGACGCCGAGATCATGGAGGTCGGCGGTGAC 
GAGCTCACCGCCTACGACAACCTCAACCCCCGCACCCGCGAGCGCGTCTGGGGCGACCTGTG 
CCGCGGACCGCACATCCCGACCACCAAACACATCCCGGCGTTCAAGCTCACCCGCAGCTCGGC 
CGCCTACTGGCGGGGCGATCAGAAAAACGCCAGCCTGCAACGGATCTACGGCACCGCGTGGG 
AATCCCAGGAGGCGCTCGACAGGCACCTGGAGTTCATCGAAGAGGCGCAGCGCCGGGACCAC 

25 CGCAAGCTGGGTGTCGAGCTGGACCTGTTCAGCTTCCCCGAGGAAATCGGTTCCGGCCTAGCG 
GTTTTCCACCCCAAGGGCGGCATCGTGCGTCGCGAACTGGAGGACTACTCGCGGCGCAAGCAC 
ACCGAGGCGGGCTACCAGTTCGTCAACAGCCCGCACATCACCAAGGCCCAGTTGTTCCACACC 
TCGGGACATCTGGACTGGTACGCGGACGGCATGTTCCCCCCGATGCACATCGACGCGGAGTAC 
AACGCCGACGGCTCGCTGCGCAAACCCGGCCAGGACTACTACCTCAAGCCGATGAACTGCCCG 

30 ATGCACTGCCTGATCTTCCGCGCGCGCGGGCGATCCTATCGGGAACTGCCGTTGCGGCTCTTC 
GAGTTCGGCACGGTGTATCGCTACGAGAAGTCCGGTGTGGTGCACGGGTTGACCCGGGTGCGT 
GGGCTGACCATGGACGACGCGCACATCTTCTGCACCCGCGACCAGATGCGCGACGAGCTGCG 
GTCGCTGCTGCGGTTTGTGCTCGACCTGCTCGCCGACTACGGCCTCACCGACTTCTACCTCGAA 
CTGTCCACCAAGGACCCGGAGAAGTTCGTCGGCGCCGAGGAGGTCTGGGAGGAAGCCACCAC 

35 CGTGCTGGCC6AGGTGGGCGCCGAATCCGGGCTGGAGCTGGTGCCCGATCCAGGCGGCGCG 
GCGTTCTACGGGCCCAAGATTTCAGTGCAGGTCAAAGACGCGCTGGGCCGCACCTGGCAGATG 
TCGACGATCGAGGTGGACTTCMCTTTCCGGAACGTTTCGGCCTGGAGTACACCGCCGCCGACG 
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GAACCCGCCACCQCCCGGTGATGATCCACCGCGCGCTATTTG6GTCGATCGAGC6G1TCTTCG 
GCATTCTCACCGAGCACTACGCGGGGGCGTTCCCGGCCTGGTTGGCGCCCGTGCAGGTGGTC 
GGCATCCCGGTCGCCGATGAGCACGTCGCCTATCTGGAAGAGGTTGCCACGCAACTGAAGTCG 
CACGGGGTGCGGGCCGAGGTGGACGCCAGCGACGATCGGATGGCCAAGAAGATCGTGCACCA 
5 CACCAACCACAAGGTGCCGTTCATGGTGTTGGCGGGTGATCGTGACGTCGCCGCCGGCGCGGT 
GAGTTTCCGGTTCGGTGACCGCACCCAAATCAACGGTGTGGCCCGTGACGATGCGGTGGCGGC 
CATTGTCGCCTGGATCGCTGACCGCGAAAATGCGGTTCCTACAGCGGAACTGGTGAAAGTGGC 
CGGTCGTGAGTGA 

10 >Rv2697cdut deoxyuridine triphosphatase TB.seq 3013683:3014144 MW:15772 
>emb|AL123456|MTBH37RV:c3014144-3013680, dut SEQ ID NO:104 

GTGTCGACCACTCTGGCGATCGTCCGCCTCGACCCCGGGCTCCCGCTGCCCAGCCGCGCTCAC 
GACGGCGACGCCGGCGTTGATCTCTACAGCGCCGAAGACGTCGAGCTGGCACCTGGGCGCCG 
CGCCCTGGTACGGACGGGTGTTGCGGTCGCCGTCCCGTTCGGCATGGTCGGGCTGGTCCATC 
15 CGCGCTCCGGGTTGGCCACGCGGGTGGGGCTTTCGATCGTCAACAGTCCGGGCACCATCGAC 
GGGGGTTATCGTGGGGAGATCAAGGTGGCCCTGATCAACTTGGACCCAGCCGCGCCCATCGTG 
GTACATCGCGGTGACXX3AATCGCCCAGTTGCTAGTGCAACGGGTTGAGTTGGTCGAGCTGGTC 
GAGGTCTCGTCGTTCGACGAGGCCGGGCTGGCCTCGACATCCCGCGGCGACGGTGGCCACGG 
TTCCTCCGGCGGACATGCGAGTTTGTGA 

20 

>Rv2782c pepR protease/peptidase. M16 family (insulinase) TB.seq 3089045:3090358 MW:47074 
>emb|AL123456|MTBH37RV:c3090358-3089042, pepR SEQ ID NO:105 

ATGCCGCGACGGTCACCAGCTGACCCCGCGGCGGCGCTGGCGCCGCGGCGCACCACCCTGC 
CGGGCGGGCTGCGAGTGGTCACCGAATTCCTGCCCGCGGTGCACTCCGCGTCGGTCGGGGTG 

25 TGGGTCGGCGTCGGATCGCGCGACGAAGGCX3CCACGGTGGCCGGGGCGGCGCACTTCCTTGA 
GCATTTGCTGTTCAAGTCGACGCCCACCCGCTCTGCCGTGGACATTGCGCAGGCGATGGACGC 
GGTGGGCGGGGAACTGAACGCATTCACCGCCAAGGAGCACACCTGCTACTACGCCCACGTGCT 
CGGCAGCGACTTGCCGTTGGCCGTCGACCTGGTCGCCGATGTGGTGCTCAACGGCCGCTGTGC 
CGCCGACGATGTCGAGGTGGAACGTGACGTCGTCCTCGAGGAGATCGCGATGCGCGACGACG 

30 ACCCCGAGGACGCCTTGGCGGACATGTTCCTGGCGGCGTTGTTCGGCGACCACCCGGTCGGTC 
GCCCGGTGATCGGCAGCGCGCAATCCGTGTCGGTGATGACGCGGGCTCAACTGCAATCGTTTC 
ACCTGCX3GCGCTATACCCCGGAGCGGATGGTCGTCGCGGCCGCCGGCAATGTGGATCACGAC 
GGGCTGGTTGCGTTGGTCCGCGAGCACTTCGGGTCCCGGTTGGTCCGGGGGAGACGGCCAGT 
TGCGCCGCGCAAGGGTACCGGCCGGGTCAACGGCAGCCCCCGGTTGACACTGGTTAGCCGCG 

35 ACGCCGAACAGACGCATGTGTCGCTGGGCATCCGCACACCCGGGCGCGGCTGGGAGCATCGT 
TGGGCACTGTCGGTGCTGCACACCGCGCTGGGCGGTGGCTTGAGTTCCCGGCTGTTCCAGGAG 
GTCCGCGAGACCCGCGGGCTGGCCTACTCGGTCTACTCCGCGCTGGATCTGTTCGCCGACAGC 
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GGCGCGCTTTCGGTGTACGCGGCCTGCCTGCCCGAACQCTTCGCCGACQTGATGCGGGTGAC 

CGCCGATGTGCTGGAAAGCGTGGCACGCGACGGCATCACCGAGGCGGAATGCGGCATCGCCA 

AGGGATCGCTGCGGGGTGGGCTGGTGCTAGGGCTGGAGGATTCCAGCTCCCGGATGAGCCGG 

CTCGGCCGCAGCGAGTTGAACTACGGCAAGCACCGCAGCATCGAACACACCTTGCGGCAAATC 

GAGCAGGTCACCGTGGAGGAGGTCAACGCGGTGGCCCGCCACCTGCTGAGCAGGCGCTACGG 

TGCTGCCGTTCTTGGCCCACACGGATCGAAACGATCACTGCCGCAACAACTTCGAGCGATGGTA 

G6GTAG 

I >Rv2783c gps! pppGpp synthase and polyribonucleotide phosphorylase TB.seq 
3090339:3092594 MW:79736 >emb|AL123456|MTBH37RV:c3092594-3090336, gpsl 
SEQ ID NO:106 

ATGTCTGCCGCTGAAATTGACGAAGGCGTGTTCGAGACGACCGCCACCATCGACAACGGGAGC 
TTTGGCACCCGGACCATCCGCTTCGAGACCGGCCGATTGGCCTTGCAGGCCGCCGGCGCGGT 
GGTCGCCTACCTCGACGACGACAACATGCTGCTGTCGGCGACCACCGCCAGCAAGAACCCCAA 

agaacacttcgacttcttccccctcacggtcgacgtcgaggagcgcatgtatgcggccggccg 

catccccggttcgttcttccgtcgcgagggccgaccctccaccgacgcgatcctgacctgccg 

gctcatcgaccgcccgctgcgcccgtcgtttgtcgacgggctgcgcaacgagatccaaatcgt 

ggtgacgattctcagcctggatccgggcgatctctacgacgtattggcgatcaacgcggcgtc 

ggcgtccacccagctgggcggtctgccgttctccgggcccatcggcggtgtgcgggtggcgc 

tcatcgacggcacctgggtcggcttccccaccgtcgaccagAtcgagcgcgccgtgttcgaca 

tggtcgtggccggccggatcgtcgagggtgatgttgccatcatgatggtcgaagccgaggcca 

ccgaaaacgtcgtcgagctcgtcgaaggtggtgcccaagcgccgacggaaagcgtggtggcc 

gcgggcctggaggcggccaagccgtttatcgccgcgctgtgcaccgcgcagcaggagcttgc 

cgatgccgctggaaagtcgggcaaaccgaccgtcgacttcccggtgttccctgactacggcga 

agacgtgtactactcggtgtcctcggtggccaccgacgagttggccgccgcgttgaccatcgg 

cggtaaagccgagcgcgaccagcgcatcgacgaaatgaagacccaggttgtgcagcggctcgc 

cgacacctacgagggtcgcgaaaaggaggtcggcgccgcgttgcgtgccctgaccaaaaagct 

ggttcggcagcgcatcctcaccgaccatttccgtatcgacggccgcggcatcaccgacattcg 

cgcattgtcggccgaggtggccgtggttccgcgcgcgcacggcagcgcgctgttcgaacgcg 

gcgaaacccagatcctgggtgtgaccacactcgacatgatcaagatggcccagcagatcgact 

cgttggggccggagacatcgaagcggtacatgcaccactacaacttcccgccgttctccaccg 

gcgagaccggtcgggtcggttcgcccaagcggcgtgagatcgggcacggcgcactggccga 

gcgggccctggtgccggtgttgccgagcgtcgaggaattcccgtatgccattcgccaggtgtc 

ggaggctctgggctccaacgggtcgacctcgatggggtcggtgtgcgcgtcgacgctggcgc 

tgctcaacgccggggtgccgctcaaggcgccggtggccggcatcgcgatgggcctggtctcc 

gacgacattcaagtagaaggggcggtcgacggcgttgtggagcgtcgcttcgtcaccctcacc 

gacatcctcggcgccgaagacgcgttcggtgacatggacttcaaggtcgccgggaccaaggac 
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TTCGTCACCGCGCTGCAGCTGGACACCMGCTCGACGGGATCCCTTCGCAGGTGCTTGCCGGA 

GCACTCGAGCAGGCCAAGGACGCCCGCCTCACGATCTTGGAGGTGATGGCTGAGGCCATCGAT 

AGACCCGACGAAATGAGTCCCTACGCCCCGCGGGTGACCACCATCAAGGTTCCGGTGGACAAG 

ATCGGGGAGGTCATCGGACCCAAGGGCAAGGTCATCAACGCCATCACCGAGGAGACCGGCGC 

GCAGATCTCCATCGAAGACGACGGCACCGTGTTCGTCGGCGCCACCGACGGGCCATCGGCACA 

GGCCGCGATCGACAAGATCAACGCCATCGCCAACCCGCAGCTGCCGACGGTGGGCGAACGGT 

TCCTCGGMCCGTGGTCAAGACCACCGATTTCGGTGCCTTTGTATCGTTGCTGCCTGGCCGCGA 

CGGTCTGGTGCACATTTCCAAACTCGGCAAGGGCAAGCGCATCGCGAAGGTCGAGGACGTTGT 

CAATGTCGGTGACAAGCTGCGGGTGGAGATCGCCGACATCGACAAACGGGGCAAGATCTCCCT 

GATCCTGGTCGCCGACGAGGACAGCACCGCCGCCGCTACCGATGCCGCGACGGTCACCAGCT 

GA 

>Rv2793c truB tRNA pseudouridine 55 synthase TB,seq 31 02364:31 03257 MW:31821 
>emb|AL123456|MTBH37RV.c3103257-3102361, truB SEQ ID NO:107 

ATGAGCGCAACCGGCCCCGGAATCGTGGTTATCGACAAGCCCGCGGGAATGACCAGCCATGAC 

GTGGTGGGGCGGTGCCGCCGCATCTTCGCCACGCGGCGGGTCGGCCACGCGGGCACCCTGG 

ACCCGATGGCCACCGGGGTGTTGGTGATCGGCATCGAACGCGCCACCAAGATCCTCGGTCTGC 

TGACGGCGGCCCCCAAGTCGTATGCCGCCACCATCCGCTTGGGTCAGACCACTTCCAGCGAGG 

ACGCCGAAGGTCAAGTGCTGCAGTGGGTTCCGGCTAAGCACCTGACCATCGAGGCGATCGACG 

CCGCGATGGAGCGGCTGCGCGGTGAGATCCGGCAGGTGCCGTCGTCGGTCAGCGCGATCAAG 

GTCGGTGGCCGACGCGCCTATCGGTTGGCCCGCCAGGGGCGCTCCGTGCAATTGGAAGCCCG 

GCCGATCCGCATCGACCGGTTCGAGCTGCTGGCCGCACGCCGGCGCGACCAGCTCATCGATAT 

CGATGTGGAGATCGACTGCTCCTCGGGAACCTACATCCGCGCGTTGGCACGCGACCTCGGCGA 

CGCGCTTGGGGTGGGAGGCCATGTGACGGCGTTGCGGCGCACGCGCGTCGGCCGCTTCGAGC 

TGGACCAGGCGAGATCGCTCGACGATCTCGCGGAGCGCCCCGCGCTGAGCCTGAGCCTCGAT 

GAGGCCTGCCTGCTGATGTTTGCGCGCCGCGACCTGACCGCCGCGGAGGCCAGCGCGGCCGC 

CAACGGCCGGTCCCTGCCGGCGGTCGGTATCGACGGCGTGTACGCGGCCTGTGACGCCGACG 

GCCGGGTTATCGCGCTGCTGCGTGACGAGGGTTCGCGGACCAGGTCGGTGGCGGTGCTCCGC 

CCGGCGACGATGCACCCCGGGTAG 

>Rv2797c - TB.seq 3105619:3107304 MW:58761 >emb|AL123456|MTBH37RV:c31 07304-31 0561 6, 
Rv2797c SEQ!DNO:108 

GTGCCACTGACCGTGGCCGATATCGATCGGTGGAACGCGCAAGCGGTCCGGGAGGTGTTTCAC 
GCGGCCAGTGCCCGAGCGGAGGT6ACGTTCGAGGCGTCGCGTCAGTTGGCCGCGCTGTGGAT 
TTTTGCGAACTCGGGTGGCAAGACCGCTGAGGCGGCGGCACACCACAACGCGGGCATTCGCC 
GAGACCTCGACGCCCACGGCAACGAGGCGTTGGCGGTTGCCCGGGCGGCCGACAGGGCCGC 
CGACGGGATTGTGAAGGTTCAGTCCGAGCTGGCCGCACTACGCCATGCCGCCGCGGCCGCCG 
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AGCT6ACGATCGATGCGCT6ATCAACCGGGTGGTGCCGATCCCCGGQCTGCGATCCACCGAG 
GCGCAGTGGGCGCGGACGCTGGCCAAGCAAACGGAGCTGCAGGCGGAGCTGGATGCGATTAT 
GGCCGAGGCCAATGCCGTCGACGAGGAGCTGGCCTCAGCGGTCAATATGGCCGACGGTGACG 
CGCCCATCCCGGCCGATTCCGGCCCGCCGGTCGGTCCCGAGGGGCTGACCCCGACCCAGCTC 

5 GCCAGCGATGCCAACGAGGAGCGGCTGCGCGAGGAGCGCGCCCGCCTGCAGGCCCACCTCG 
AGCGGTTACAGGCGGAGTATGACCAACTGAGTGTGCGGGCCGCCCGTGACTACCACAACGGCA 
TCCTCGACGGTGACGCGGTGGGCCGACTGGCAGCGCTTACCGACGAGCTGAGCGCCGCCAG6 
GGCCGGCTGGGTGAGCTCGATGCCGTCGACGAGGCGTTGAGCCGAGCACCCGAGACCTACCT 
GACCCAGCTGCAGATTCCCGAGGACCCAAATCAGCAGGTGCTGGCGGGCGTGGCCGTCGGTAA 

10 TCCCGACACCGCCGCCAATGTGTCGGTGACGGTTCCCGGCGTCGGGTCCACCACCCGGGGCG 
CCCTGCCCGGCATGGTGACCGAAGCCCGCGACCTGCGGTCGGAGGTAATCCGGCAACTCAATG 
CTGCCGGCAAGCCCGCATCGGTTGCCACCATCGCCTGGATGGGCTACCACCCGCCCCCGAACC 
CACTCGACACCGGCAGTGCGGGCGATCTGTGGCAGACCATGACCGATGGGCAGGCACACGCG 
GGCGCGGCCGATCTGTCGCGGTATTTGCAGCAGGTGCGCGCCAATAACCCCAGTGGCCACCTG 

15 ACCGTGTTGGGGCACTCGTATGGGTCGCTGACGGCGTCGCTGGCGTTGCAGGACCTCGATGCC 
CAGAGCGCCCATCCGGTCAACGACGTCGTGTTTTACGGCTCACCCGGCTTGGAGCTGTACAGC 
CCGGCGCAGCTCGGGCTCGATCACGGGCACGCTTATGTCATGCAGGCCCCCCACGACCTCATC 
ACCAATCTGGTGGCGCCGTTGGCGCCGCTGCACGGATGGGGCCTGGACCCCTATCTGACCCCC 
GGGTTCACGGAGCTGTCGTCACAGGCGGGTTTTGATCCGGGCGGGATCTGGCGTGACGGAGT 

20 GTATGCCCACGGGGACTACCCGCGGTCCTTCCTCGATGCCGCCGGCCAGCCGCAGCTGCGGA 
TGTCCGGCTATAACCTGGCGGCGATCGCCGCCGGGCTGCCCGACAACACGGTGGGCCCGCCG 
CTGCTTCCGCCAATTCTGGGTGGCGGCATGCCGGCAGCGCCCGGCCCAGCACTGAGAGGGGG 
ACGTTGA 

25 >Rv2864cponA2TB.seq 3175454:3177262 MW:63015 >emb|AL123456|MTBH37RV:c3177262- 
3175451, Rv2864c SEQ ID NO: 109 

ATGGTAACTAAAACAACATTAGCCTCAGCCACCTCAGGTTTGCTGCTGCTTGCGGTCGTCGCCAT 
GTCGGGCTGCACCCCGCGTCCCCAAGGGCCCGGTCCGGCGGCCGAAAAGTTCTTCGCCGCGC 
TGGCCATCGGTGACACCGCCTCCGCCGCCCAGCTCAGCGACAACCCCAACGAGGCGCGCGAA 

30 GCGCTGAACGCGGCCTGGGCGGGGCTGCAGGCCGCCCACCTGGATGCGCAGGTTCTCAGCGC 
CAAGTACGCCGAGGACACCGGTACGGTCGC1TATCGCTTCAGCTGGCATCTGQCCAAGGACCG 
AATCTGGACCTATGACGGCCAGCTGAAGATGGCCCGCGACGAAGGGCGTTGGCACGTTCGCTG 
GACCACCAGCGGG7TGCATCCCAAGCTAGGCGAACATCAAACGTTCGCGCTACGAGCCGACCC 
GCCGCGGCGCGCCTCGGTGAACGAAGTCGGCGGCACCGATGTGCTGGTGCCGGGCTATCTGT 

35 ATCACTACTCGCTGGACGCCGGCCAGGCCGGGCGCGAGCTCTTCGGCACGGCACACGCGGTG 
GTGGGCGCGCTGCACCCCTTCGACGACACGCTCAATGATCCGCAGCTGCTGGCCGAACAGGCC 
AGCTCGTC6ACCCAGCCGTTGGACCTGGTCACGTTGCACGCCGACGACAGCAACCGGGT6GC 
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CGCGGCGATCGGGCAGCTGCCTGGCGTGGTGATCACACCGCAG6CCGAGCTGCTCCCGACCG 
ACAAGCACTTCGCGCCGGCGGTCCTCAACGATGTCAAGAAGGCCGTCGTCGATGAACTCGACG 
GCAAGGCGGGTTGGCGGGTGGTGAGCGTCAACCAAAATGGCGTCGACGTCTCGGTGCTGCAG 
GAGGTCGCCCCATCACCTGCGTCGTCGGTTTCGATCACGTTGGATCGGGTCGTGCAAAACGCC 

5 GCGCAACACGCGGTGAACACCCGGGGCGGCAAGGCGATGATCGTGGTGATCAAGCCGTCGAC 
CGGCGAGATCCTGGCGATCGCGCAGAACGCCGGGGCCGATGCGGACGGTCCGGTCGCGACCA 
CCGGTCTATATCCACCCGGGTCGACATTCAAGATGATCACCGCCGGTGCGGCCGTCGAGCGTG 
ACCTGGCTACCCCTGAGACGCTGCTGGGTTGCCCCGGGGAGATCGACATCGGGCATCGCACCA 
TTCCCAACTACGGTGGCTTTGATCTGGGCGTGGTGCCGATGTCACGCGCGTTTGCCAGTTCCTG 

10 CAACACCACCTTCGCCGAGCTGAGCAGCAGGCTGCCTGCCCGCGGTCTGACTCAGGCGGCCC 
GGCGGTACGGGATCGGGCTTGACTACCAGGTGGACGGCATCACCACGGTGACCGGTTCGGTG 
CCGCCGACGGTGGACCTGGCCGAACGGACCGAGGACGGTTTCGGCCAGGGCAAGGTGCTGGC 
CAGCCCGTTCGGCATGGCCTTGGTGGCGGCGACGGTAGCCGCCGGGAAGACCCCGGTTCCAC 
AGCTGATCGCCGGCCGGCCGACGGCCGTCGAAGGCGATGCCACACCGATCAGCCAGAAGATG 

15 ATCGACGCGCTGCGGCCCATGATGCGGTTGGTGGTGACCAATGGCACCGCCAAGGAGATCGCT 
GGCTGTGGCGAGGTGTTCGGTAAGACCGGCGAAGCCGAATTCCCGGGCGGATCGCATTCCTG 
GTTCGCCGGGTACCGTGGCGATCTGGCATTTGCGTCGCTGATCGTCGGGGGCGGTAGCTCGGA 
ATACGCGGTGCGGATGACCAAGGTGATGTTCGAATCGCTGCCGCCGGGGTACCTGGCGTAG 

20 >Rv2868c gcpE TB.seq 3179368:3180528 MW:40451 >emb|AL123456|MTBH37RV:c31 80528= 
3179365, gcpE SEQIDNO:110 

GTGACTGTAGGCTTGGGCATGCCGCAGCCCCCGGCACCCACGCTCGCTCCCCGGGGCGCCAC 
CCGTCAGCTGATGGTCGGCAACGTCGGCGTGGGCAGTGACCATCCGGTCTCGGTGCAATCGAT 
GTGCACCACCAAAACCCACGACGTCAACTCGACATTGCAACAAATCGCCGAGCTGACCGCGGC 

25 CGGATGCGACATCGTGCGGGTGGCCTGCCCGCGCCAGGAGGACGCCGACGCGCTGGCCGAG 
ATCGCCCGGCACAGCCAGATCCCGGTAGTCGCGGACATACATTTCCAGCCGCGCTACATATTCG 
CCGCCATCGACGCTGGATGTGCCGCGGTGCGGGTCAACCCGGGCAACATCAAGGAGTTTGACG 
GCCGGGTGGGTGAGGTCGCCAAGGCGGCGGGTGCGGCCGGGATCCCGATCCGAATGGGTGT 
CAACGCCGGTTCGCTGGACAAACGGTTCATGGAGAAGTATGGCAAAGCCACGCCCGAGGCGCT 

30 GGTTGAGTCGGCGCTGTGGGAGGCTTCGCTTTTCGAGGAGCATGGCTTCGGTGACATCAAGAT 
CAGCGTCAAGCACAACGACCGGGTGGTGATGGTCGCCGCCTACGAGCTGGTTGCTGCACGGTG 
CGACTACCCACTGCACCTCGGTGTCACCGAGGCCGGCCCTGCTTTCCAGGGGACCATCAAGTC 
CGCGGTTGCCTTCGGCGCGTTGCTGTCGCGGGGCATAGGCGACACCATCCGGGTGTCGTTGTC 
GGCCCCGCCGGTCGAGGAAGTCAAGGTGGGCAATCAGGTTCTCGAGTCGTTGAAGCTGCGGCC 

35 GCGTTCGCTCGAGATCGTGTCTTGCCCGTCGTGCGGTCGCGCGCAAGTCGACGTCTAGACCCT 
GGCCAACGAGGTAACCGCCGGCCTGGATGGTCTCGATGTGCCGTTGCGGGTGGCCGTGATGG 
GGTGTGTCGTCAATGGTCCGGGTGAAGCACGTGAGGCCGACCTGGGCGTGGCGTCCGGCAAC 
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GGCAAAGGTCAGATCTTTGTACGGGGCGAAGTGATCAAGACCGTGCCCGAAGCACAGATCGTC 

GAGACGCTGATCGAGGAGGCGATGCGGCTGGCCGCCGAAATGGGCGAGCAAGATCGGGGCGC 

GACACCGAGCGGTTCGCCTATTGTGACCGTAAGCTGA 

5 >Rv2869c - TB.seq 3180548:3181759 MW:42835 >emb|AL123456|MTBH37RV:c3181759~3180545 f 
Rv2869c SEQIDNO:111 

ATGATGTTTGTTACCGGCATTGTGCTGTTCGCGCTCGCGATCCTGATTTCGGTGGCCCTGCACG 
AATGTGGTCACATGTGGGTCGCGCGCCGCACCGGGATGAAGGTACGTCGCTATTTCGTCGGCT 
TTGGCCCCACGTTGTGGTCGACCCGGCGCGGCGAGACCGAATACGGTGTCAAAGCCGTTCCGC 

10 TGGGCGGCTTCTGTGACATCGCCGGCATGACCCCGGTCGAGGAACTCGACCCCGACGAACGTG 
ACCGTGCGATGTACAAGCAGGCCACCTGGAAGCGGGTCGCAGTGTTATTCGCCGGGCCCGGAA 
TGAACCTCGCTATCTGCCTGGTGCTGATCTATGCCATCGCGCTGGTCTGGGGGCTGCCTAACCT 
GCATCCGCCAACCAGGGCCGTAATCGGCGAAACTGGCTGCGTTGCACAGGAAGTGAGCCAGG 
GCAAGCTCGAGCAGTGCACCGGGCCCGGTCCGGCGGCGCTGGCCGGAATTCGCTCCGGTGAC 

15 GTCGTGGTCAAGGTCGGTGACACCCCGGTGTCCAGTTTCGACGAGATGGCCGCCGCGGTGCG 
CAAGTCACACGGCAGCGTCCCGATCGTTGTCGAGCGTGACGGCACCGCGATTGTTACCTACGT 
GGACATCGAATCCACCCAACGCTGGATCCCTAACGGGCAGGGCGGTGAGCTCCAGCCGGCAAC 
GGTCGGTGCGATTGGGGTGGGCGCCGCCCGGGTCGGGCCTGTGCGCTACGGCGTGTTCTCCG 
CCATGCCGGCCACATTCGCGGTCACCGGCGACCTGACCGTGGAGGTGGGCAAGGCGCTGGCC 

20 GCCCfCCCGACCAAGGTAGGTGCGCTGGTGCGGGCGATCGGCGGCGGGCAGCGTGACCCGC 
AGACGCCGATAAGTGTGGTGGGCGCCAGCATCATCGGCGGCGACACCGTCGACCATGGGCTG 
TGGGTGGCGTTCTGGTTCTTCTTGGCCCAGCTGAACCTCATCCTGGCTGCGATCAACCTGCTGC 
CGTTGCTGCCGTTCGATGGCGGCCATATTGCCGTCGCGGTGTTCGAGAGGATCCGCMCATGG 
TCCGGTCGGCTCGTGGCAAGGTGGCGGCCGCACGGGTGAATTACCTCAAACTCTTGCCGGCGA 

25 CCTATGTGGTCTTGGTTCTTGTCGTCGGGTACATGCTCTTGACCGTCACCGCCGACCTGGTCAA 
CCCGATTAGGCTTTTCCAGTAG 



>Rv2870c - TB.seq 3181770:3183077 MW:45324 >emb|AL123456|MTBH37RV:c3183077-3181767, 
Rv2870c SEQIDNO:112 

30 GTGGCTACCGGTGGACGCGTCGTGATCCGGCGGCGCGGTGACAACGAGGTGGTGGCGCACAA 
TGATGAGGTGACCAACTCGACCGACGGGCGCGCTGAGGGCCGGTTGCGGGTGGTGGTGCTGG 
GCAGTACCGGCTCGATCGGCACCCAGGCGCTTCAGGTCATCGCCGACAATCCGGACCGTTTCG 
AGGTAGTCGGGCTGGCCGCTGGCGGCGCCCATCTGGACACGTTGCTGCGACAACGTGCGCAG 
ACCGGGGTGACCAATATTGCCGTCGCTGACGAGCACGCGGCGCAGCGGGTCGGCGACATCCC 

35 CTACCACGGATCCGACGCCGCCACCCGGCTGGTCGAGCAGACCGAGGCCGACGTCGTCCTCA 
ATGCGCTGGTCGGCGCGTTGGGCCTGCGACCGACGTTGGCCGCGCTCAAGACGGGTGCCCGG 
CTGGCGCTGGCCAACAAGGAATCGCTGGTCGCCGGTGGTTCGCTGGTGCTGCGGGCGGCGCG 
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6CCCGGTCAGATCGTGCCGGTCGACTCCGAACACTCCGCGCTGGCCCAGTGCCTGCGCGGCG 

GCACTCCCGACGAGGTCGCCAAGCTGGTGCTGACGGCCTCGGGAGGGCCGTTTCGGGGCTGG 

TCCGCGGCCGACCTCGAGCATGTCACCCCCGAGCAGGCTGGCGCGCATCCTACGTGGTCGATG 

GGCCCGATGAACACGCTGAATTCGGCGTCGCTGGTCAACAAGGGACTTGAGGTCATCGAAACC 

CACCTGCTGTTCGGCATCCCCTACGACCGCATCGATGTCGTGGTGGACCCCCAGTCGATCATCC 

ATTCGATGGTCACCTTCATCGACGGTTCGACGATCGCCCAGGCCAGTCCCCCGGACATGAAGCT 

ACCGATTTCGTTAGCGCTGGGCTGGCCGCGTCGGGTCAGCGGCGCCGCTGCTGCCTGTGATTT 

CGATACCGCGTCGAGCTGGGAGTTCGAGCCGTTGGACACCGACGTCTTCCCCGCGGTCGAGTT 

GGCCCGGCAGGCCGGCGTAGCCGGTGGCTGCATGACCGCGGTTTACAATGCGGCGAACGAAG 

AAGCAGCAGCGGCGTTCCTTGCTGGCCGGATCGGCTTCCCGGCCATCGTCGGCATCATCGCCG 

ACGTGTTGCACGCTGCCGACCAATGGGCCGTCGAACCCGCTACCGTGGATGACGTACTCGACG 

CGCAGCGCTGGGCCCGCGAGCGAGCGCAGCGCGCGGTATCTGGTATGGCTTCGGTGGCGATC 

GCAAGCACGGCGAAGCCGGGCGCAGCGGGTCGACACGCATCGACGTTAGAAAGGTCCTGA 

>Rv2922c smc member of Smc1/Cut3/Cut14 family TB.seq 32341 89:3238055 MW: 1 3961 0 

>emb|AL123456|MTBH37RV:c323805^3234186 > smc SEQ ID NO:1 13 

GTGGGTGCAG^GAGTCGGTiVcCGCTGGTGbACCCGCTG^GAGCGTT^ 

CCGGTTACGCGGCCAACCACGCCGACGGACGCGTGCTGGTGGTCGCCCAGGGTCCGCGCGCT 

GCGTGCCAGAAGCTGCTGCAGCTGCTGCAGGGCGACACGACACCGGGCCGCGTCGCCAAAGT 

CGTGGCCGACTGGTCGCAGTCGACGGAGCAGATCACCGGGTTCAGCGAGCGGTAATCTGGCC 

CCTCGTGTACCTCAAGAGTCTGACGTTGAAGGGCTTCAAGTCCTTCGCCGCGCCGACGACTTTA 

CGCTTCGAGCCGGGCATTACGGCCGTCGTTGGGCCCAACGGCTCCGGCAAATCCAATGTGGTC 

GATGCCCTGGCGTGGGTGATGGGGGAGCAGGGGGCAAAGACGCTGCGCGGCGGCAAGATGG 

AAGACGTCATCTTCGCCGGCACCTCGTCGCGTGCGCCGCTGGGCCGCGCCGAAGTCACCGTTA 

GCATCGACAACTCCGACAACGCACTGCCTATCGAATACACCGAGGTGTCGATCACCCGAAGAAT 

GTTTCGCGACGGTGCCAGCGAATACGAMTCMGGGCAGCAGTTGCCGTTTGATGGATGTGCA 

GGAGTTGCTGAGCGACTCCGGCATCGGCCGTGAGATGCATGTGATTGTTGGGCAAGGGAAGCT 

CGAGGAGATCTTGCAGTCGCGGCCTGAGGATCGGCGGGCGTTCATCGAGGAAGCCGCCGGTG 

TGCTCAAGCATCGCAAGCGCAAGGAAAAAGCTCTGCGCAAACTCGACACGATGGCGGCGAACC 

TGGCCCGGCTCACCGATCTGACCACCGAGCTCCGGCGTCAACTCAAACCGCTGGGCCGGCAG 

GCCGAGGCGGCGGAGCGTGCCGCGGCCATCCAAGCCGATCTGCGCGACGCCCGGCTGCGCCT 

GGCGGCCGACGACTTGGTAAGCCGCAGAGCCGAACGGGAAGCGGTCTTTCAGGCCGAGGCTG 

CGATGCGCCGCGAGCATGACGAGGCGGCCGCCCGGCTGGCGGTGGGATCCGAGGAGCTGGC 

GGCGCATGAGTCCGCGGTCGCCGAACTCTCGACGCGGGCCGAGTCGATCCAGCACACTTGGTT 

CGGGGTGTGTGCGCTGGCCGAACGGGTGGACGCTACGGTGCGCATCGCCAGCGAACGCGCCC 

ATCATCTCGATATCGAGCCGGTAGCGGTCAGCGACACCGACCCCAGAAAGCCCGAGGAGCTAG 

AAGCCGAGGCCCAGCAGGTGGCCGTCGCCGAGCAACAACTGTTAGCGGAGCTGGACGCGGCG 
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CGTGCCCGACTCGATGCTGCCCGTGCAGAGCTGGCCGACCGGGAGCGCCGCGCCGCCGAGG 

CCGACCGGGCACACCTGGCGGCGGTCCGGGAGGAGGCGGACCGCCGTGAGGGACTGGCGCG 

GCTGGCTGGCCAGGTGGAGACCATGCGGGCGCGTGTCGAATCGATCGATGAGAGCGTGGCAC 

GGTTGTCCGAGCGGATCGAGGATGCCGCAATGCGCGCCCAGCAGACCCGAGCCGAGTTCGAA 

ACCGTGCAGGGCCGCATCGGTGAACTGGATCAAGGCGAGGTCGGCCTGGATGAGCACCACGA 

GCGTACTGTGGCCGCGTTGCGGTTGGCCGACGAACGCGTCGCCGAGCTGCAATCCGCCGAAC 

GCGCCGCCGAACGCCAGGTGGCATCGCTACGGGCTCGCATCGATGCGCTGGCAGTGGGGCTA 

CAGCGCAAGGACGGCGCGGCGTGGCTGGCGCACAATCGCAGTGGCGCAGGGCTTTTCGGTTC 

GATCGCCCAATTGGTGAAGGTACGTTCCGGCTATGMGCGGCACTGGCCGCGGCGCTCGGGC 

CGGCGGCCGACGCACTTGCGGTGGACGGCCTGACTGCCGCGGGTAGTGCCGTCAGCGCACTC 

AAACAAGCCGAC6GCGGTCGCGCGGTCCTCGTGCTGAGTGACTGGCCGGCCCCGCAAGCCCC 

CCAATCCGCCTCGGGGGAGATGCTGCCTAGCGGCGCCCAGTGGGCCCTAGACCTGGTCGAGT 

CTGGACGGCAGTTGGTTGGCGCGATGATCGCCATGCTTTCGGGTGTCGCGGTGGTCAACGACC 

TGACTGAGGCAATGGGCCTGGTCGAGATTCGTCCGGAGCTACGCGCGGTCACCGTTGACGGTG 

ATCTGGTGGGCGCCGGCTGGGTCAGCGGCGGATCGGACCGCAAGCTGTCCACCTTGGAGGTC 

ACCTCCGAGATCGACAAGGCCAGGAGTGAGCTGGCCGCTGCCGAGGCGCTGGCGGCGCAATT 

GAATGCGGCCCTGGCCGGTGCGCTGACCGAGCAGTCCGCCCGCCAGGACGCGGCCGAGCAA 

GCCTTGGCCGCGCTTAACGAATCCGACACGGCCATCTCGGGGATGTACGAGCAGCTGGGCCGC 

CTCGGGCAGGAGGCCCGCGCGGCGGAAGAAGAGTGGAACCGGTTGCTGCAGCAGCGTACGGA 

ACAGGAAGCCGTGCGCACACAGACTCTCGACGACGTCATACAACTTGAGACCCAGCTGCGTAA 

GGCCCAGGAGACCCAACGGGTGCAGGTGGCCCAACCGATCGACCGCCAGGCGATCAGTGCCG 

CTGCCGATCGCGCCCGCGGTGTCGAAGTGGAAGCCCGGCTGGCGGTGCGCACCGCCGAGGAA 

CGCGCCAACGCGGTTCGCGGGCGGGCCGATTCGCTGCGCCGTGCGGCTGCGGCGGAACGTG 

AGGCGCGGGTGCGGGCTCAGCAAGCACGCGCCGCAAGACTGCATGCGGCCGCGGTGGCCGC 

AGCGGTCGCCGACTGCGGACGGCTGCTGGCCGGGCGGTTGCACCGGGCGGTGGACGGGGCG 

TCGCAACTGCGCGACGCGTCGGCCGCGCAACGTCAGCAGCGGTTAGCGGCGATGGCCGCGGT 

GCGCGACGAGGTGAACACGCTGAGCGCCCGAGTGGGGGAACTCACCGATTCGCTGCACCGCG 

ACGAGCTGGCTAACGCGCAGGCGGCGCTGCGTATGGAGCAGCTTGAGCAGATGGTGCTAGAG 

CAGTTCGGAATGGCGCCGGCCGAGTTGATCACCGAATACGGTCCACATGTGGCGCTACCACCG 

ACCGAGCTCGAGATGGCTGAGTTCGAGCAAGCCCGCGAACGCGGGGAGCAGGTGATTGCGCC 

CGCCCCCATGCCGTTCGACCGGGTTACCCAGGAGCGCCGGGCCAAACGCGCCGAGCGTGCGC 

TTGCCGAGTTGGGCAGGGTCAACCCGCTGGCGCTCGAAGAGTTTGCTGCCTTGGAGGAGCGCT 

ACAATTTCCTGTCCACCCAACTCGAGGATGTCAAGGCTGCCCGCAAGGATGTGCTGGGCGTCGT 

CGCCGATGTTGACGCCCGCATCCTGCAGGTGTTCAATGACGCGTTCGTAGACGTGGAACGCGA 

ATTTCGCGGCGTGTTCACCGCATTGTTCCCGGGTGGTGAAGGACGGCTGCGGCTGACCGAGCC 

CGACGACATGCTCACCACCGGCATCGAGGTCGAAGCCCGCCCGCCGGGCAAGAAGATTACCC 

GACTGTCTTTGCTCTCCGGTGGCGAGAAGGCGCTGACCGCGGTGGCGATGCTGGTGGCGATCT 
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TTCGTGCCCGTCCATCGCCQTTCTACATCATGGACGAGGTGGAGGCCGCCGTCGAC6ACGTGA 

ACCTGCGCCGACTGCTCAGCCTGTTCGAACAGCTGCGAGAGCAGTCGCAGATCATCATCATCAC 

CCACCAGAAGCCGACGATGGAGGTCGCGGACGCACTGTACGGCGTAACCATGCAGAACGACG 

GCATCACCGCGGTCATCTCGCAGCGCATGCGCGGTCAGCAGGTGGATCAGCTGGTTACCAATT 

CCTCGTAG 

>Rv2925c mc RNAse III TB.seq 3239829:3240548 MW:25400 
>emb|AL123456|MTBH37RV:c3240548-3239826 i mc SEQ ID NO:114 

ATGATCCGGTCACGACAACCCCTGCTCGACGCACTCGGTGTGGACCTCCCGGACGAGCTGCTC 

TCACTGGCGTTGACCCACCGCAGCTACGCCTACGAGAACGGCGGGCTGCCGACCAACGAGCGT 

TTGGAGTTTCTCGGCGATGCCGTGCTAGGGCTGACCATCACCGACGCGCTGTTCCATCGTCATC 

CTGATCGGTCGGAGGGGGATCTGGCCAAACTGCGGGCCAGCGTAGTCAACACCCAGGCCCTG 

GCCGACGTCGCACGCCGCCTCTGTGCGGAAGGCCTCGGTGTTCACGTGCTATTGGGTCGCGGC 

GAGGCGAACACCGGCGGGGCCGACAAGTCCAGCATTCTGGCCGACGGTATGGAATCGCTGCT 

GGGCGCGATCTACCTGCAACACGGTATGGAGAAGGCCCGTGAGGTGATCCTGCGGCTGTTTGG 

CCCGTTGCTGGACGCCGCGCCGACCCTGGGTGCGGGATTGGATTGGAAGACCAGCTTGCAGG 

AGCTGACTGCAGCGCGAGGGCTGGGTGCGCCGTCATACCTGGTCACCTCCACCGGCCCGGAC 

CACGATAAGGAATTCACCGCGGTGGTTGTCGTGATGGACAGCGAATACGGTTCAGGAGTGGGC 

CGGTCCAAAAAAGAAGCCGAGCAAAAAGCCGCGGCGGCCGCTTGGAAAGCCCTGGAAGTGCTC 

GACAACGCCATGCCGGGCAAAACCTCCGCCTAA 

>Rv2934 ppsDTB.seq 3262245:3267725 MW:193317 
>emb|AL123456|MTBH37RV:3262245-3267728, ppsD SEQ ID NO:115 

ATGACAAGTCTGGCGGAGCGCGCGGCGCAACTGTCGCCGAACGCGCGAGCGGCCCTGGCGCG 

CGAGCTCGTCCGTGCGGGTACGACCTTCCCGACCGACATCTGCGAGCCGGTGGCGGTGGTGG 

GCATCGGCTGTCGCTTTCCGGGGMTGTGACTGGGCCAGAGAGCTTTTGGCAGCTACTGGCCG 

ACGGTGTGGACACAATCGAGCAGGTGCCGCCTGATCGGTGGGATGCGGACGCGTTCTACGATC 

CCGATCCTTCGGCGTCGGGTCGGATGACGACGAMTGGGGTGGTTTCGTTTCCGATGTCGACG 

CGTTCGACGCCGACTTTTTCG6AATCACTCCTCGGGAAGCCGTGGCGATGGACCCGCAGCATC 

GGATGCTGCTCGAGGTTGGCTGGGMGCGTTGGAGCACGCGGGTATTCCGCCGGATTCCTTGA 

GCGGCACTCGAACCGGCGTGATGATGGGTCTGTCGTCGTGGGACTACACGATCGTCAATATCG 

AGCGCAGAGCCGACATCGACGCGTACCTGAGCACCGGAACCCCGCACTGTGCCGCGGTGGGG 

CGGATCGCGTATCTGTTGGGATTGCGTGGTCCGGCCGTCGCCGTAGATACCGCTTGTTCGTCGT 

CGCTGGTGGCAATTCACTTGGCGTGTCAGAGCCTTCGCCTGCGTGAAACCGACGTGGCATTGG 

CGGGCGGGGTGCAGCTCACCTTGTCACCGTTCACCGCCATCGCGCTGTGCAAGTGGTCGGCGC 

TGTCACCGACCGGCCGATGCAACAGCTTCGACGCCAACGCGGATGGATTCGTGCGCGGCGAG 

GGCTGCGGCGTGGTGGTGCTCAAGCGGTTGGCCGACGCGGTGCGCGACCAGGACCGGGTGCT 
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TGCGGTGGTCCGCGGTTCGGCAACTAACTCCGATGGTCGGTCCAACGGCATGACCGCACCGAA 
CGCGCTGGCGCAGCGTGACGTGATCACATCCGCCCTCAAGCTTGCGGATGTTACCCCTGACAG 
CGTGAACTATGTCGAAACACACGGCACCGGAACGGTGTTGGGGGACCCCATCGAGTTCGAGTC 
GCTGGCGGCCACTTATGGCCTGGGTAAAGGCCAGGGCGAGAGCCCGTGCGCATTGGGGTCGG 
5 TCAAGACCAACATGGGCCACCTGGAGGCGGCCGCCGGTGTGGCTGGATTCATCAAGGCGGTGC 
TGGCGGTGCAACGTGGGCACATTCCCCGCAACTTGCACTTCACCCGGTGGAACCCGGCCATCG 
ACGCGTCGGCGACGCGGCTGTTCGTGCCGACCGAAAGCGCCCCGTGGCCGGCGGCTGCCGGT 
CCACGCAGGGCTGCGGTGTCATCGTTCGGCCTCAGCGGGACCAACGCGCACGTGGTGGTCGA 
GCAGGCACCCGACACCGCAGTAGCCGCAGCCGGCGGCATGCCGTATGTTTCGGCGCTGAACG 

10 TCTCCGGCAAGACGGCCGCGCGGGTGGCGTCGGCGGCGGCGGTGCTGGCCGACTGGATGTC 
GGGGCCGGGCGCGGCGGCACCACTGGCCGACGTGGCACACACGTTGAACCGGCACCGGGCC 
CGGCACGCCAAGTTCGCCACCGTCATCGCGCGTGACCGCGCCGAGGCGATCGCGGGGTTGCG 
AGCGCTGGCGGCCGGACAACCACGCGTTGGGGTGGTGGATTGCGACCAGCATGCCGGTGGGC 
CTGGCCGGGTTTTTGTGTATTCGGGTCAGGGCTCGCAGTGGGCGTCGATGGGCCAGCAGTTGC 

15 TGGCCAACGAACCGGCGTTCGCCAAGGCGGTAGCCGAGCTGGATCCGATATTCGTTGACCAGG 
TTGGCTTTTCGCTGCAGCAAACGCTTATCGACGGCGACGAGGTGGTGGGCATCGACCGCATCC 
AGCCGGTGCTGGTCGGGATGCAGTTGGCGCTGACCGAGTTATGGCGGTCCTATGGGGTGATTC 
CAGATGCCGTGATCGGGCACTCGATGGGTGAGGTGTCGGCGGCAGTGGTGGCCGGCGCGTTG 
ACGCCCGAGCAGGGCTTGCGGGTCATCACCACCCGGTCGCGGTTGATGGCGCGGCTGTCGGG 

20 GCAGGGAGCGATGGCGCT6CTCGAGCTGGATGCCGACGCCGCCGAGGCGCTGATTGCCGGCT 
ATCCGCAGGTGACGCTGGCGGTGCATGCGTCACCGCGCCAGACGGTGATCGCCGGGCCGCCC 
GAGCAGGTGGACACGGTGATCGCGGCGGTAGC6ACGCAAAACCGGTTGGCGCGCCGCGTCGA 
AGTCGACGTGGCCTCCCATCACCCGATCATCGATCCCATACTGCCCGAGTTGCGAAGCGCGTTA 
GCGGATTTGACTCCGCAGCCGCCGAGCATCCCGATCATTTCCACTACGTACGAAAGCGCGCAG 

25 CCGGTGGCGGATGCCGACTATTGGTCGGCCAACCTGCGCAACCCGGTGCGATTCCACCAGGCC 
GTCACCGCCGCCGGTGTCGACCACAACACCTTCATCGAAATCAGCCCTCACCCCGTGCTCACG 
CACGGACTCACCGACACCCTGGATCCGGACGGCAGCCATACAGTCATGTCGACGATGAACCGC 
GAACTGGACCAGACGCTGTATTTCCACGCCCAACTCGCCGCGGTCGGTGTGGCTGCGTCCGAG 
CACACCACCGGTCGCCTTGTCGACCTGCCCCCCACACCGTGGCACCATCAGCGATTCTGGGTC 

30 ACGGATCGTTCGGCGATGTCCGAGCTGGCCGCGACCCACCCGCTCCTGGGCGCGCACATCGA 
GATGCCGCGCAACGGAGACCATGTCTGGCAGACCGATGTCGGCACCGAGGTCTGTCCCTGGTT 
GGCAGACCACAAGGTGTTCGGTCAACCCATCATGCCGGCCGCGGGGTTCGCCGAGATCGCCTT 
GGCGGCGGCCAGCGAAGCCCTCGGCACAGCCGCCGACGCCGTCGCACCCAACATCGTGATCA 
ACCAGTTCGAGGTGGAGCAGATGCTGCCCCTCGACGGCCACACGCCGCTAACGACGCAGTTAA 

35 TTCGCGGCGGGGACAGCCAGATTCGGGTCGAGATCTATTCCCGCACGCGTGGCGGAGAGTTCT 
GCCGACACGCCACGGCCAAGGTTGAACAATCGCCGCGCGAATGTGCGCACGCGCACCCGGAA 
GCCCAAGGTCCCGCCACCGGGACMCAGTGTCGCCGGCCGATTTTTATGCCCTGCTCCGCCAA 



129 



WO 01/35317 



PCT/US00/31152 



ACCGGCCAACACCATGGTCCGGCGTTCGCGGCCTTAAGCCGGATCGTGCGCCTGGCCGATGGT 

TCCGCGGAAACCGAGATCAGCATTCCCGAGGAGGCGCCGCGCCATCCCGGGTATCGGCTGCA 

CCCCGTGGTATTGGATGCGGCATTGCAAAGCGTGGGTGCCGCGATACCCGACGGCGAGATGGC 

GGGGTCGGCGGAAGCCAGCTATCTGCCAGTGTCGTTCGAGACCATCCGGGTGTACCGCGACAT 

CGGTCGGCACGTCAGGTGTCGTGCCCACCTGACAAACCTCGACGGCGGCACCGGAAAGATGG 

GCAGGATCGTCCTAATCAACGACGCGGGCCACATAGCGGCCGAAGTGGACGGCATCTATCTGC 

GTCGTGTCGAACGCCGTGCGGTACCCCTGCCACTAGAGCAGAAGATCTTCGATGCCGAATGGA 

CCGAAAGCCCGATCGCAGCCGTGCCGGCTCCGGAGCCAGCTGCCGAGACGACGCGGGGAAGT 

TGGCTGGTACTCGCCGATGCAACGGTGGATGCGCCAGGCAAGGCCCAGGCCAAGTCGATGGC 

CGACGACTTCGTGCAGCAGTGGCGCTCACCGATGGGGCGGGTGCACACCGCCGATATCCACGA 

CGAATCGGCGGTGCTGGCCGCATTTGCAGAAACGGCAGGCGATCCCGAGCACCCGCCGGTTG 

GCGTGGTGGTGTTCGTCGGCGGTGCCTCGAGTCGACTGGACGACGAGCTGGCGGCGGCGCGC 

GACACGGTGTGGTCGATCACCACGGTGGTTCGTGCGGTCGTCGGCACGTGGCACGGCCGATCA 

CCGCGGCTATGGCTGGTCACCGGGGGCGGACTTTCCGTTGCCGACGACGAGCCGGGAACACC 

CGCGGCGGCTTCCTTGAAAGGGCTGGTGCGGGTGCTCGCCTTCGAGCACCCGGACATGCGCA 

CCACCCTGGTCGATCTGGACATCACACAAGACCCGCTGACCGCGCTGAGCGCGGAAGTGCGGA 

ATGCCGGGAGTGGGTCGCGCCATGATGACGTGATCGCGTGGCGCGGCGAGCGCAGGTTCGTC 

GAACGGCTGTCGCGCGGGACGATCGATGTATCCAAAGGGCATCCGGTGGTGCGCCAGGGAGC 

GTCGTACGTCGTCACCGGCGGCCTCGGCGGTCTCGGCCTGGTCGTCGCTCGTTGGCTGGTGG 

ACCGCGGCGCCGGCCGGGTGGTGCTGGGTGGCGGCAGCGATCGCACTGACGAGCAGTGCAAC 

GTCCTGGCCGAACTGCAGACCCGCGCCGAGATCGTGGTTGTCCGTGGCGACGTGGCATCGCC 

GGGGGTGGCAGAAAAGCTGATTGAGACGGCCCGACAGTCTGGGGGCCAATTGCGCGGCGTCG 

TGCAGGCCGCCGCGGTCATCGAAGACAGCCTGGTGTTCTCTATGAGCAGGGACAACCTAGAAC 

GGGTGTGGGCACCCAAGGCCACCGGTGCGCTGCGCATGCACGAAGCCACCGCTGACTGCGAG 

CTCGACTGGTGGCTCGGATTCTCTTCCGCCGCTTCGCTATTGGGTTCTCCCGGGCAAGCGGCGT 

ACGCGTGCGCCAGCGCGTGGCTGGACGCGCTGGTCGGATGGCGCAGGGCATCCGGCGTGCC 

GGCCGCGGTGATCAACTGGGGTCCGTGGTCGGAGGTAGGCGTCGCCCAGGCCTTGGTGGGGA 

GTGTTCTGGACACGATCAGTGTCGCAGAAGGCATCGAGGCTCTCGACTCATTGCTTGCCGCCGA 

CCGGATCCGCACTGGAGTGGCTCGGCTGCGTGCCGATCGGGCCCTGGTCGCATTCCCGGAGA 

TCCGCAGCATCAGCTACTTCACCCAGGTGGTCGAGGAGCTGGACTCGGGGGGTGACCTCGGCG 

ACTGGGGCGGGGGCGACGCGGTTGCCGACCTCGACCCGGGCGAGGCGCGGCGCGCGGTGAC 

CGAGCGGATGTGTGCGCGCATCGCTGCGGTGATGGGCTACAGTGACCAGTCGACTGTCGAACC 

CGCCGTGGCCTTGGACAAGCCCGTGACCGAGCTGGGGCTGGATTCTCTGATGGCGGTACGAAT 

ACGCAACGGCGCGCGGGCGGATTTCGGCGTGGAACCGCCGGTAGCGCTGATACTGCAAGGCG 

CGTCCTTGCATGACCTGACGGCGGACTTAATGCGCCAACTCGGGCTCAATGATCCCGATCCGG 

CGCTCAACAACGCTGACACTATTCGCGACCGGGCGCGCCAGCGCGCGGCAGCGCGACACGGA 

GCCGCGATGCGGCGCGGACCTAAACCTGAAGTACAGGGAGGATAA 
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>Rv2946c pksl TB.seq 3291503:3296350 MW:166642 
>emb|AL123456|MTBH37RV:c3296350-3291500, pksl SEQ ID N0:116 

GTGATTTCGGCGAGATCGGCTGAGGCGTTGACGGCGCAGGCGGGTCGACTTATGGCCCACGTG 

5 CAGGCCAACCCAGGGCTGGATCCGATCGATGTGGGGTGCTCGTTGGCCAGTCGCTCGGTGTTT 
GAGCACCGAGCGGTGGTGGTCGGCGCAAGCCGTGAGCAACTGATTGCCGGGCTGGCTGGGCT 
CGCGGCGGGCGAGCCGGGTGCCGGCGTGGCGGTCGGTCAGCCAGGGTCGGTGGGCAAGACG 
GTGGTCGTGTTTCCTGGGCAGGGCGCGCAGCGCATCGGGATGGGCCGCGAGTTGTACGGCGA 
GTTGCCCGTGTTTGCGCAGGCATTCGATGCGGTGGCCGACGAGTTGGACCGGCATCTGCGGTT 

10 GCCGCTGCGCGACGTTATTTGGGGTGCCGATGCGGATTTGCTTGACAGCACCGAATTTGCTCAG 
CCCGCGTTGTTCGCGGTGGAGGTGGCATCGTTCGCGGTGTTGCGGGATTGGGGTGTGCTTCCG 
GACTTCGTCATGGGTCACTCCGTTGGAGAGCTGGCGGCGGCGCACGCGGCCGGTGTGTTGAC 
GTTGGCGGACGCGGCGATGGTGGTGGTGGCGCGGGGCCGGTTGATGCAGGCGCTGCCGGCA 
GGCGGTGCGATGGTGGCGGT6GCTGCGAGTGAGGACGAGGTGGAGCCGCTGGTGGGTGAGG 

15 GTGTGGGGATGGCTGCGATCAACGCGCCCGAATCGGTGGTGATCTCCGGTGCGCAGGCCGCG 
GCAAATGCGATTGCGGATCGGTTCGCCGCGCAGGGTCGGCGGGTGCACCAGTTGGCGGTCTC 
GCATGCGTTTCATTCGCCGTTGATGGAGCCGATGCTCGAGGAGTTCGCGCGTGTCGCGGCCCG 
GGTGCAGGCACGCGAGCCCCAGCTTGGGCTGGTGTCGAACGTGACGGGCGAGTTGGCCGGCC 
CTGATTTCGGGTCGGCGCAGTACTGGGTGGACCACGTTCGTCGGCCGGTGCGCTTCGCGGACA 

20 GTGCGCGTCATTTGCAGACCCTTGGGGCGACCCACTTCATCGAGGCCG6CCCGGGAAGTGGTT 
TGACTGGCTCGATCGAGCAGTCCTTGGCCCCGGCTGAGGCGATGGTGGTGTCGATGCTGGGCA 
AAGACCGGCCCGAGCTGGCCTCGGCGCTCGGTGCTGCCGGTCAGGTGTTCACCACCGGTGTG 
CCGGTGCAGTGGTCGGCGGTGTTCGCCGGCTCGGGTGGACGGCGGGTGCAGCTGCCCACGTA 
TGCGTTTCAGCGACGGCGGTTTTGGGAGACGCCGGGCGCGGATGGGCCCGCCGATGCGGCCG 

25 GGTTGGGTCTGGGCGCGACCGAGCATGCCTTGTTGGGTGCGGTGGTCGAGCGGCCCGATTCT 
GACGAGGTGGTGCTGACCGGCCGGTTGTCGCTTGCGGATCAGCCGTGGCTGGCCGACCACGT 
GGTGAACGGGGTGGTGCTGTTCCCCGGGGCGGGTTTTGTGGAGTTGGTGATCCGCGCCGGTG 
ATGAGGTGGGGTGCGCGCTCATCGAAGAGTTGGTGCTGGCCGCACCGTTGGTGATGCACCCGG 
GTGTCGGGGTTCAGGTGCAGGTGGTCGTCGGGGCTGCCGATGAATCCGGGCACCGTGCGGTG 

30 TCGGTGTATTCCCGCGGTGATCAATCCCAGGGTTGGTTGCTGAACGCCGAAGGCATGCTGGGG 
GTGGCTGCCGCTGAGACGCCGATGGATTTGTCCGTGTGGCCGCCCGAGGGCGCGGAGAGTGT 
GGATATCTCGGACGGCTATGCGCAGTTGGCCGAGCGCGGTTATGCCTACGGCCCCGCGTTTCA 
GGGTCTGGTGGGGATCTGGCGGCGGGGGTCGGAGCTGTTCGCCGAAGTTGTAGCCCCCGGCG 
AGGCCGGCGTGGCCGTCGACCGAATGGGGATGCATCCGGCGGTGTTGGACGCGGTGCTGCAT 

35 GCCCTCGGGCTGGCCGTCGAGMGACCCAGGCGAGCACCGAGACGAGACTGCCGTTTTGCTG 
GCGTGGGGTGTCGCTGCATGCCGGCGGCGCTGGACGGGTGCGGGCCCGCTTCGCGTCCGCG 
GGCGCGGATGCGATTTCCGTGGACGTCTGCGACGCCACTGGGCTGCCGGTGTTGACGGTGCG 

131 



\ 



WO 01/35317 



PCT/US00/31152 



CTCGCTGGTTACTCGCCCGATAACCGCAGAACAGCTGCGCGCCGCCGT6ACCGCGGCCGGCG 
GTGCGTCCGATCAGGGGCCGCTGGMGTGGTGTGGTCGCCGATCTCGGTGGTCAGCGGCGGC 
GCTMCGGGTCCGCCCGACCTGCCCCGGTGTCTTGGGCGGACTTTTGCGCCGGCAGTGATGGT 
GACGCCAGTGTCGTGGTGTGGGAACTCGAGTCTGCCGGTGGCCAAGCATCCTCGGTGGTGGG 

5 CTCGGTGTATGCGGCCACCCACACCGCCCTGGAGGTGTTGCAGTCCTGGCTCGGCGCGGATCG 
GGCGGCCACGTTGGTGGTGTTGACCCATGGTGGCGTGGGGCTGGCTGGCGAGGACATCAGCG 
ACCTGGCCGCCGCCGCGGTGTGGGGCATGGCGCGTTCCGCGCAGGCCGAAAATCCCGGCCG 
GATCGTGTTGATCGACACCGATGCGGCGGTGGATGCCTCGGTGCTAGCCGGCGTCGGGGAAC 
CCCAGCTGCTGGTGCGCGGCGGCACTGTGCACGCCCCCCGGCTGTCCCCGGCCCCGGCGTTG 

10 CTAGCGTTACCG6CGGCAGAGTCGGCGTGGCGATTGGCCGCCGGTGGTGGCGGGACCCTGGA 
GGATTTGGTGATCCAGCCCTGCCGGGAGGTACAGGCACCGCTACAGGCGGGGCAGGTGCGCG 
TGGCGGTGGCGGCCGTCGGGGTCAACTTCCGCGATGTGGTGGCCGCCCTAGGGATGTATCCC 
GGCCAGGCCCCACCGCTGGGTGCCGAAGGCGCCGGGGTGGTGCTTGAGACCGGTCCCGAAGT 
GACCGATCTTGCCGTCGGTGACGCCGTGATGGGATTCCTGGGCGGGGCCGGTCCGCTGGCGG 

15 TGGTGGATCAGCAACTGGTTACCCGGGTGCCGCAAGGCTGGTCGTTTGCTCAGGCAGCCGCTG 
TGGCGGTGGTGTTCTTGACGGCCTGGTACGGGTTGGCCGATTTAGCCGAGATCAAGGCGGGCG 
AATCGGTGCTGATCCATGCCGGTACCGGCGGTGTGGGCATGGCGGCTGTGCAGCTGGCTCGC 
CAGTGGGGCGTGGAGGTTTTCGTCACCGCCAGCCGTGGCAAGTGGGACACGCTGCGCGCCAT 
GGGGTTTGACGACGACCATATCGGCGATTCCCGCACATGCGAGTTCGAGGAGAAGTTCCTGGC 

20 GGTCACCGAGGGCCGCGGGGTTGATGTGGTGCTCGACTCGCTGGCCGGTGAGTTCGTGGATG 
CGTCGCTGCGCTTACTGGTCCGCGGTGGGCGTTTCCTCGAGATGGGCAAGACGGATATCCGCG 
ATGCGCAGGAGATCGCCGCTAATTATCCCGGCGTGCAGTATCGGGCGTTCGACCTGTCGGAGG 
CCGGCCCGGCACGCATGCAGGAGATGTTGGCCGAGGTGCGGGAGCTGTTCGACACCCGGGAG 
CTGCACCGGCTACCGGTCACCACGTGGGATGTGCGCTGCGCCCCGGCGGCCTTCCGGTTCATG 

25 AGCCAGGCCCGCCATATCGGCAAGGTTGTCTTAACCATGCCCTCGGCGTTGGCCGACCGGCTT 
GCCGACGGCACGGTGGTGATCACCGGTGCCACCGGGGCGGTTGGTGGGGTGTTGGCCCGCCA 
CCTGGTTGGCGCCTATGGGGTGCGTCATCTGGTGTTGGCCAGTCGGCGGGGCGATCGCGCGG 
AGGGAGCGGCCGAATTGGCCGCCGACTTGAGGGAGGCCGGCGCCAAGGTGCAGGTGGTGGC 
CTGTGACGTGGCCGATCGCGCTGCGGTAGCGGGGTTGTTTGCCCAGCTGTCGCGGGAGTACCC 

30 GCCGGTGCGCGGGGTGATTCATGCCGCCGGCGTGCTCGATGACGCAGTGATCACCTCGTTGAC 
ACCGGACCGCATCGATACGGTGTTGCGGGCCAAGGTGGACGCGGCGTGGAACCTGCACCAGG 
CCACCAGTGACCTGGATTTGTCGATGTTTGCGCTGTGCTCATCGATCGCGGCCACGGTCGGCTC 
GCCGGGGCAGGGCAACTACTCGGCGGCAAACGCGTTTCTGGACGGGTTGGCCGCTCACCGGC 
AGGCCGCAGGGTTGGCCGGGATATCACTGGCGTGGGGTTTGTGGGAACAGCCTGGCGGCATG 

35 ACCGCGCATTTGAGCAGCCGAGATCTGGCCCGCATGAGCCGCAGCGGGCTGGCTCCGATGAG 
CCCTGCCGAAGCGGTGGAATTGTTTGACGCTGCGCTGGCCATCGATCACCCTCTGGCGGTGGC 
CACGCTCTTGGACCGGGCTGCACTAGACGCCCGGGCCCAGGCCGGTGCGTTGCCGGCGCTGT 
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TCAGCGGGCTCGCGCGCCGCCCACGCCGACGCCAAATCGACGACACCGGTGACGCCACCTCG 
TCGAAGTCGGCGCTGGCTCAACGCCTACACGGGCTGGCCGCGGACGAACAACTCGAGCTGCTA 
GTGGGGCTGGTGTGTCTGCAGGCAGCGGCAGTGCTGGGTAGGCCCTCCGCCGAGGACGTCGA 
CCCCGACACCGAATTCGGCGACCTCGGTTTCGACTCATTAACGGCTGTGGAGTTACGCAACCGC 
5 CTCAAAACCGCCACCGGACT6ACGCTGCCACCTACCGTGATTTTGGATCATCCCACTCCCACTG 
CGGTCGCCGAGTATGTCGCCCAGCAAATGTCTGGCAGCCGCCCAACGGAATCCGGTGATCCGA 
CGTCGCAGGTTGTCGAACCCGCCGCCGCGGAAGTATCGGTCCATGCCTAG 

>Rv3014c ligA DNA ligase TB.seq 3372545:3374617 MW:75258 

10 >emb|ALl23456|MTBH37RV:c3374617-3372542, ligA SEQ ID NO:117 

GTGAGCTCCCCAGACGCCGATGAGACCGCTCCCGAGGTGTTGCGGCAGTGGCAGGCACTGGC 
CGAGGAGGTGCGTGAGCACCAGTTCCGTTATTACGTGCGGGACGCGCCGATCATCAGCGACGC 
GGAATTCGACGAGCTGCTGCGCCGTCTGGAAGCCCTCGAGGAGCAGCATCCCGAGCTGCGCA 
CGCCCGATTCGCCGACCCAGCTGGTCGGCGGTGCCGGCTTCGCCACGGATTTCGAGCCCGTC 

15 GACCATCTCGAACGAATGCTCAGCCTCGACAACGCGTTCACCGCGGACGAACTCGCCGCCTGG 
GCCGGCCGCATCCATGCCGAGGTCGGAGACGCCGCACATTACCTGTGTGAGCTCAAGATCGAC 
GGCGTCGCGCTGTCTTTGGTCTACCGCGAGGGACGGCTGACCCGGGCCTCCACCCGCGGCGA 
CGGGCGCACCGGCGAGGACGTCACCCTGAACGCCCGGACCATCGCCGACGTTCCCGAACGGC 
TCACCCCCGGCGACGACTACCCGGTGCCCGAGGTCCTCGAGGTCCGCGGCGAGGTCTTCTTCC 

20 GGCTGGACGACTTCCAGGCGCTCAACGCCAGCCTCGTCGAGGAGGGCAAGGCGCCGTTCGCC 
AACCCCCGCAACAGCGCGGCGGGATCGCTGCGCCAGAAAGACCCGGCGGTCACCGCGCGCCG 
CCGGCTGCGGATGATCTGCCACGGGCTGGGCCACGTGGAGGGGTTTCGCCCGGCCACCCTGC 
ATCAGGCATACCTGGCGTTGCGGGCATGGGGACTGCCGGTTTCCGAACACACCACCCTGGCAA 
CCGACCTGGCCGGTGTGCGCGAGCGCATCGACTACTGGGGCGAGCACCGCCACGAGGTGGAC 

25 CACGAAATCGACGGCGTGGTGGTCAAAGTCGACGAGGTGGCGTT6CAGCGCAGGCTGGGTTC 
CACGTCGCGGGCGCCGCX5CTGGGCCATCGCCTACAAGTACCCGCCCGAGGAAGCGCAGACCA 
AGCTGCTCGACATCCGGGTGAACGTCGGCCGCACCGGGCGGATCACGCCGTTTGCGTTCATGA 
CGCCGGTGAAGGTGGCCGGGTCGACGGTGGGACAGGCCACCCTGCACAACGCCTCGGAGATC 
AAGCGCAAGGGCGTGCTGATCGGCGACAGGGTGGTGATCCGCAAGGCCGGGGACGTGATCCC 

30 CGAGGTGCTGGGACCCGTCGTCGAACTGCGCGATGGCTCCGAACGCGAATTCATCATGCCCAC 
CACCTGCCCGGAGTGCGGTTCGCCGTTGGCGCCGGAGAAGGAAGGCGACGCGGACATCCGTT 
GCCCCAACGCCCGCGGCTGCCCGGGGCAACTGCGGGAGCGGGTTTTCCACGTCGCCAGCCGC 
MCGGCCTAGACATCGAGGTGCTCGGTTACGAGGCGGGTGTGGCGCTCTTGCAGGCGAAGGT 
GATCGCCGACGAGGGCGAGCTGTTCGCGCTGACCGAGCGGGACTTGCTGCGCACCGACCTGT 

35 TCCGAACCAAGGCAGGCGAACTGTCGGCCAACGGCAAACGGCTGCTGGTCAACCTCGACAAGG 
CCAAGGCGGCACCGCTGTGGCGGGTGCTGGTGGCGCTGTCCATCCGCCATGTCGGGCCGACG 
GCGGCCCGCGCCCTGGCCACCGAG7TCGGCAGCCTTGACGCCATCGCGGCGGCGTCCACCGA 
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CCAGCTGGCCGCCGTCGAGQGGGTGGGGCCGACCATTGCCGCCGCGGTCACCGAGTGGTTCG 
CCGTCGACTGGCACCGCGAGATCGTCGACAAGTGGCGGGCCGCCGGGGTGCGAATGGTCGAC 
GAGCGTGACGAGAGTGTGCCACGCACGCTGGCCGGGCTGACCATCGTGGTCACCGGCTCGCT 
GACCGGTTTCTCCCGCGACGACGCCAAGGAGGCGATCGTGGCCCGCGGCGGCAAGGCCGCCG 
5 GCTCGGTGTCGAAGAAGACCAACTATGTCGTCGCCGGAGACTGGCCGGGATCCAAATACGACA 
AGGCGGTGGAGTTGGGGGTGCCGATTCTGGACGAGGATGGGTTCCGGAGACTGCTGGCCGAC 
GGACCCGCGTCACGAACGTAA 

>Rv3025c - NifS-like protein TB.seq 3383885:3385063 MW:40948 

10 >embIAL123456|MTBH37RV:c3385063-3383882, Rv3025c SEQ ID NO;1 18 

ATGGCCTACCTGGATCACGCTGCCACCACCCCGATGCACCCCGCCGCCATCGAGGCGATGGCG 
GCCGTGCAGCGCACCATCGGCAATGCGTCGTCGCTGCACACCAGCGGGCGCTCGGCGCGCCG 
GCGGATCGAGGAGGCCCGTGAGCTGATCGCGGACAAGCTAGGCGCTCGTCCGTCCGAGGTGA 
TCTTCACCGCGGGCGGCACCGAAAGCGACAACCTGGCTGTCAAAGGTATCTATTGGGCACGCC 

15 GCGATGCGGAGCCGCACCGCCGTCGCATCGTCACCACCGAGGTGGAACACCACGCCGTACTG 
GACTCGGTGAACTGGCTCGTGGAACACGAAGGCGCCCATGTGACCTGGCTGCCGACCGCCGC 
CGACGGCTCGGTGTCGGCAACTGCGCTGCGCGAGGCACTGCAGAGCCACGACGACGTCGCGC 
TGGTATCGGTGATGTGGGCCAACAACGAGGTCGGAACTATTCTACCGATCGCCGAAATGTCAGT 
TGTCGCCATGGAATTCGGCGTGCCGATGCACAGTGATGCCATTCAGGCGGTGGGACAGCTCCC 

20 GCTTGACTTCGGGGCCAGCGGGCTGTCGGCGATGAGCGTGGCCGGGCACAAATTCGGTGGCC 
CGCCAGGAGTGGGTGCGTTGCTGCTGCGCCGCGACGTCACCTGCGTGCCCCTTATGCACGGC 
GGTGGGCAGGAGCGCGATATTCGTTCCGGCACACCCGATGTCGCCAGTGCAGTTGGAATGGCG 
ACGGCCGCGCAGATCGCGGTGGACGGACTCGAGGAAAACAGCGCGCGGTTACGGCTGCTGCG 
GGATCGTCTGGTCGAGGGTGTGCTGGCTGAGATTGACGATGTTTGCCTTAACGGCGCCGATGA 

25 CCCGATGCGGCTAGCGGGTAACGCGCACTTCACTTTCCGTGGCTGCGAAGGCGATGCGCTGTT 
GATGTTGTTGGACGCTAACGGAATCGAGTGCTCAAGCGGATCGGCCTGCACGGCAGGTGTAGC 
GCAGCCCTCGCATGTGTTGATTGCAATGGGCGTCGACGCGGGCAGCGCCCGCGGATCATTGCG 
TCTCTCGCTGGGGCACACCAGTGTTGAGGCTGATGTCGATGCCGCGTTGGAGGTGCTTCCCGG 
GGCGGTGGCACGTGCACGGCGGGCCGCCCTAGCCGCCGCGGGAGCATCCCGATGA 

30 

>Rv3080c pknK serine-threonine protein kinase TB.seq 3442656:3445985 MW: 1 1 9420 
>emb|AL123456|MTBH37RV:c3445985-3442653, pknK SEQ ID NO:119 

ATGACCGACGTTGATCCGCACGCGACGCGGCGGGACCTGGTCCCGAATATTCCCGCGGAACTG 
CTTGAGGCTGGATTCGACAATGTCGAGGAGATCGGGCGCGGCGGATTCGGCGTCGTCTACCGC 
35 TGCGTCCAGCCCTCGCTGGACCGCGCCGTCGCCGTCAAGGTATTGAGCACCGACCTGGATCGG 
GACAATCTCGAGCGCTTCCTGCGCGAGCAGCGGGCCATGGGCCGCCTTTCCGGGCACCCGCA 
CATCGTGACCGTCTTGCAGGTGGGCGTGTTGGCGGGTGGGCGGCCCTTCATCGTGATGCCCTA 
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CCACGCCAAGAATTCGTTGGAGACGCTGATTCQCCGGCACGGGCCGCTGGACTGGCGCGAGA 
CGCTGTCGATCGGCGTCAAGCTCGCGGGAGCGCTGGAAGCCGCGCATCGCGTCGGCACCCTG 
CACCGTGACGTGAAGCCGGGGAATATCCTGCTGACCGACTACGGGGAACCGCAGCTGACCGAT 
TTCGGAATCGCCAGAATCGCCGGGGGTTTCGAGACGGCGACCGGGGTGATTGCCGGTTCCCCG 

5 GCTTTCACCGCGCCGGAAGTTCTCGAAGGAGCATCGCCGACGCCCGCCTCTGACGTGTACTCC 
CTGGGCGCGACGTTGTTCTGTGCGCTGACCGGCCATGCCGCCTACGAGCGCCGCAGCGGTGA 
GCGGGTGATCGCCCAGTTCCTGCGGATCACCTCGCAGCCGATCCCCGACCTGGGGAAGCAGG 
GACTGCCCGCGGACGTGGCCGCCGCCATCGAACGGGCGATGGCCCGCCATCCGGCGGATCGT 
CCCGCGACCGCGGCAGACGTTGGCGAGGAGCTTCGCGACGTTCAGCGCGGCAACGGCGTCAG 

10 CGTCGACGAGATGCCCCTCCCCGTCGAGCTGGGCGTGGAACGCCGACGCTC6CCCGAGGCGC 
, ACGCGGCGCATCGGCATACCGGCGGCGGCACCCCGACGGTCCCGACGCCTCCGACACCCGCG 
ACCAAGTACCGGCCGTCGGTGCCCACCGGCTCGCTGGTCACCCGCAGCCGGCTCACCGACAT 
CCTGCGCGCCGGCGGACGGCGCCGGCTGATCCTCATCCACGCGCCCTCGGGATTCGGCAAAA 
GCACCCTGGCGGCGCAATGGCGGGAAGAGCTCTCGCGCGACGGCGCCGCGGTCGCCTGGCT 

15 GACAATCGACAACGACGACAACAACGAGGTGTGGTTCTTGTCGCACCTGCTCGAGTCGATCCG 
GCGGGTCCGGCCCACGCTGGCCGAGTCGTTGGGGCACGTGCTCGAAGAGCATGGGGATGACG 
CCGGCCGCTACGTGTTGACTTCGCTGATCGACGAAATCCACGAAAACGACGACCGGATCGCGG 
TGGTGATCGACGACTGGCATCGGGTGTCCGACAGCCGCACCCAAGCTGCCCTGGGTTTCCTGC 
TGGACAACGGATGTCACCACCTGCAGCTCATCGTGACCAGCTGGTCTCGCGCCGGTTTGCCGG 

20 TGGGCAGGTTGCGGATCGGCGACGAACTAGCCGAGATCGATTCGGCTGCTTTGCGCTTCGATA 
CCGACGAGGCCGCCGCGCTGCTGAACGATGCTGGTGGTCTGCGATTGCCGCGCGCAGACGTG 
CAGGCGCTGACTACCTCTACCGACGGGTGGGCCGCGGCGCTGCGGCTGGCCGCGCTGTCGCT 
GCGCGGCGGGGGCGACGCGACCCAACTCCTGCGCGGACTTTCCGGCGCCAGTGACGTGATCC 
ACGAATTCCTGAGCGAAAACGTGCTGGACACCCTGGAACCCGAAGTGCGCGAATTCCTACTGGT 

25 GGCATCGGTCACCGAACGCACGTGCGGCGGGCTGGCCTCGGCGCTGGCCGGGATCACCAATG 
GGCGGGCGATGCTGGAAGAGGCCGAGCACCGCGGCTTGTTCCTGCAACGGACCGAAGACGAC 
CCGAATTGGTTTCGCTTCCACCAAATGTTCGCCGACTTTCTCCACCGTCGCCTCGAACGTGGCG 
GGTCGCACCGGGTGGCGGAACTGCACCGCAGGGCATCGGCCTGGTTCGCCGAGAACGGCTAC 
CTGCACGAAGCCGTCGACCATGCACTGGCCGCGGGCGATCCCGCGCGCGCCGTCGATCTTGT 

30 CGAGCAGGATGAAACGAACCTGCCGGAGCAGTCAAAGATGACCACACTTCTGGCAATCGTGCA 
GAAACTGCCGACGTCGATGGTGGTTTCACGGGCCCGGCTCCAACTCGCCATCGCGTGGGCGAA 
CATTCTGCTGCAACGGCCGGCGCCGGCCACCGGTGCCCTGAATCGTTTCGAAACGGCCCTTGG 
CCGGGCCGAGCTTCCCGAGGCGACGCAGGCGGATCTGCGGGCCGAGGCAGACGTGTTGCGG 
GCGGTCGCCGAGGTGTTCGCAGACCGGGTCGAGCGCGTGGATGACCTTCTCGCCGAGGCAAT 

35 GTCGAGACCGGACACCCTGCCCCCGCGAGTCCCCGGGACCGCCGGCAACACCGCGGCGTTGG 
CCGCGATCTGCCGCTTCGAGTTCGCCGAGGTATATCCACTGCTGGACTGGGCCGCGCCCTACC 
AGGAAATGATGGGACCGTTCGGCACCGTTTATGCGCAGTGCTTGCGCGGCATGGCGGCCAGGA 
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ATCG6CTCGACATTGTCGCTGCGCTACAGAACTTCCGAACGGCGTTCGAGGTCGGCACGGCAG 
TGGGGGCCCACTCGCACGCGGCGCGGCTTGCGGGTTCGCTGCTCGCCGAATTGCTCTACGAG 
ACCGGCGATCTGGCCGGGGCTGGTCGTCTCATGGACGAGAGCTATCTGCTGGGTTCCGAGGG 
GGGTGCAGTGGACTACCTGGCCGCCAGGTACGTGATCGGCGCGCGGGTCAAGGCGGCCCAGG 
5 GGGATCATGAGGGTGCGGCTGATCGCCTGTCCACCGGAGGCGATACTGCCGTCCAGCTGGGG 
CTGCCGCGCCTGGCTGCCCGAATCAACAACGAGCGGATCCGGCTGGGCATCGCGCTACCTGC 
GGCGGTGGCCGCCGATTTGCTGGCACCCCGCACCATCCCCCGCGACAATGGAATCGCCACCAT 
GACAGCCGAACTCGACGAGGACTCCGCGGTGCGCCTGTTGTCCGCCGGCGACTCCGCCGATC 
GTGACCAAGCCTGCCAACGGGCCGGTGCTCTCGCCGCCGCCATCGACGGTACGCGCAGACCG 
10 CTGGCGGCGCTGCAGGCGCAAATACTTCATATCGAAACGCTTGCCGCCACCGGACGGGAATCC 
GATGCGCGAAACGAACTGGCGCCGGTAGCCACGAAGTGCGCCGAACTCGGGCTGTCACGTCT 
GCTGGTCGATGCGGGACTGGCCTAA 

>Rv31 06 fprA adrenodoxin and NADPH ferredoxin reductase TB.seq 3474004:3475371 

15 MW:49342 >emb|AL123456|MTBH37RV:3474004-3475374, fprA SEQIDNO:120 

ATGCGTCCGTATTACATCGCCATCGTGGGCTCCGGGCCGTCGGCGTTCTTCGCCGCGGCATCC 
TTGCTGAAGGCCGCCGACACGAGCGAGGACCTCGACATGGCCGTCGACATGCTGGAGATGTTG 
CCGACTCCCTGGGGGCTGGTGCGCTCCGGGGTCGCGCCGGATCACCCCAAGATCAAGTCGAT 
CAGCAAGCAATTCGAAAAGACGGCCGAGGACCCCCGCTTCCGCTTCTTCGGCAATGTGGTCGT 

20 CGGCGAACACGTCCAGCCCGGCGAGCTCTCCGAGCGCTACGACGCCGTGATCTACGCCGTCG 
GCGCGCAGTCCGATCGCATGTTGAACATCCCCGGTGAGGACCTGCCGGGCAGTATCGCCGCC 
GTCGATTTCGTCGGCTGGTACAACGCACATCCACACTTCGAGCAGGTATCACCCGATCTGTCGG 
GCGCCCGGGCCGTAGTTATCGGCAATGGAAACGTCGCGCTAGACGTGGCACGGATTCTGCTCA 
CCGATCCCGACGTGTTGGCACGCACCGATATCGCCGATCACGCTTTGGAATCGCTACGCCCAC 

25 GCGGTATCCAGGAGGTGGTGATCGTCGGGCGCCGAGGTCCGCTGCAGGCCGCGTTCACCACG 
TTGGAGTTGCGCGAGCTGGCCGACGTCGACGGGGTTGACGTGGTGATCGATCCGGCGGAGCT 
GGACGGCATTACCGACGAGGACGCGGCCGCGGTGGGCAAGGTCTGCAAGCAGAACATCAAGG 
TGCTGCGTGGCTAT6C6GACCGCGAACCCCGCCCGGGACACCGCCGCATGGTGTTCCGGTTCT 
TGACCTCTCCGATCGAGATCAAGGGCAAGCGCAAAGTGGAGCGGATCGTGCTGGGCCGCAACG 

30 AGCTGGTCTCCGACGGCAGCGGGCGAGTGGCGGCCAAGGACACCGGCGAGCGCGAGGAGCT 
GCCAGCTCAGCTGGTCGTGCGGTCGGTCGGCTACCGCGGGGTGCCCACGCCCGGGCTGCCGT 
TCGACGACCAGAGCGGGACCATCCCCAACGTCGGCGGCCGAATCAACGGCAGCCCCAACGAAT 
ACGTCGTCGGGTGGATCAAGCGCGGGCCGACCGGGGTGATCGGGACCAACAAGAAGGACGCC 
CAAGACACCGTCGACACCTTGATCAAGAATCTTGGCAACGCCAAGGAGGGCGCCGAGTGCAAG 

35 AGCTTTCCGGAAGATCATGCCGACCAGGTGGCCGACTGGCTAGGAGCACGCCAGCCGAAGCT6 
GTCACGTCGGCCCACTGGCAGGTGATCGACGCTTTCGAGCGGGCCGCCGGCGAGCCGCACGG 
GCGTCCCCGGGTCAAGTTGGCCAGCCTGGCCGAGCTGTTGCGGATTGGGCTCGGCTGA 
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>Rv3235 -TB.seq 3611296:3611934 MW:22659 >emb|AL123456|MTBH37RV:361 1296-361 1937, 
Rv3235 SEQIDNO:121 

ATGATGGCCAGCAACCAAACCGCTGCGCAACACTCGTCTGCCACTCTCCAGCAGGCTCCTCGTT 
5 CGATCGATGATGCTGGAGGGTGCCCCTTGACCATCAGTCCTATCGCGAACTCACCGGGCGACA 
CCTTCGCCGTCACACCCGTCGTCGAGTACGAGCCGCCGCCGGGAAACATCCCGCCGTGCGGG 
GAATCATCGCACGCAGCCCGGCGGCCGCACACCCCGCAGCTAGCTCGCCGACAACCAATCAGG 
CCGAGCGGCCGGGCACCGGCAGCGGTCACCTCCACGGCCAAGTCACCGCGGCTGCGTCAAGC 
GGGGACCTTCGCCGATGCCGCGCTACGCCGAGTGCTGGAGGTCATCGACCGCCGCCGCCCGG 
10 TGGGCCAGCTGCGCCCCCTGCTGGCACCCGGCCTCGTCGACTCCGTGCTCGCGGTGAGCCGC 
ACGGCGGCCGGACACCAACAAGGCGCGGCCATGCTGCGCCGCATCCGGCTGACACCGGCCGG 
ACCCGACACGGCGGACACCGCCGCCGAGGTCTTCGGCACCTACAGTCGCGGGGACCGGATCC 
ATGCGATCGCCTGCCGGGTGGAACAACGGCCCGCCGGTAACGAAACCCGATGGCTGATGGTC 
GCCCTGCACATCGGGTGA 

15 

>Rv3255c manA mannose-6-phosphate isomerase TB.seq 3635040:3636263 MW:43340 
>emb|AL123456|MTBH37RV:c3636263-3635037 > manA SEQ ID NO:122 

GTGGAACTGCTACGTGGCGCGTTACGCACCTACGC7TGGGGATCGCGCACCGCTATCGCCGAA 
TTCACCGGGCGTCCGGTGCCGGCCGCTCACCCCGAGGCCGAACTATGGTTCGGTGCACACCC 

20 GGGTGATCCGGCTTGGCTGCAGACGCCGCATGGCCAAACCTCGTTGCTCGAAGCGTTGGTCGC 
GGATCCGGAGGGGCAGCTCGGCTCCGCGTCGCGCGCGCGATTCGGCGATGTGTTGCCGTTCT 
TGGTCAAGGTGTTGGCGGCGGACGAGCCACTATCGTTGCAGGCCCATCCGAGCGCCGAGCAG 
GCGGTTGAGGGCTACCTGCGGGAAGAGCGAATGGGCATTCCGGTGTCCTCACCCGTCCGCAAC 
TACCGCGACACCAGTCACAAGCCAGAGTTATTGGTGGCGCTGCAGCCGTTCGAGGCGCTGGCC 

25 GGATTCCGGGAGGCGGCTCGCACCACCGAGCTGCTGCGGGCGCTGGCCGTATCCGACCTCGA 
CCCGTTGATCGACTTGCTGAGCGAG6GGTCCGATGCCGATGGTTTGCGTGCGGTGTTCACCAC 
CTGGATTACCGCACCCCAGCCCGAGATCGACGTGCTGGTGCCTGCCGTGCTGGACGGCGCTAT 
CCAGTACGTCAGCTCCGGCGCAACGGAATTTGGCGCCGAAGCCAAGACAGTGCTGGAACTCGG 
CGAACGTTATCCCGGCGACGCCGGTGTGCTGGCGGCGTTGTTGCTCAACCGCATGAGCTTGGC 

30 TCCTGGGGAGGCGATCTTCCTGCCGGCCGGCAACGTGCACGCCTATGTGCGTGGTTTGGGTGT 
GGAAGTGATGGCCAACTCCGACAACGTGTTACGCGGTGGACTTACCCCTAAGCACGTCGATGT 
GCCCGAGTTGTTGCGGGTGGTGGACTTCGCCCCCACGCCGAAGGCTCGGCTGCGGCCCCCGA 
TCCGGCGCGAGGGGCTGGGGGTGGTCTTTGAGACGCCGACCGATGAGTTCGGGGCCACGCTA 
CTGGTGCTCGACGGCGATCACCTCGGCCACGAGGTCGACGCGTCGTCCGGCCATGACGGTCC 

35 ACAGATCTTGTTATGCACCGAGGGTTCGGCGAGGGTGCACGGGAAGTGCGGGTCGCTCACGCT 
ACAGCGCGGCACGGCCGCCTGGGTGGCGGCCGACGACGGCCCGATCCGGCTGACCGCCGGC 
CAACCCGCCAAGCTGTTCAGGGCGAGCGTCGGGTTGTGA 
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>Rv3264c rmlA2 glucose-1 -phosphate thymidyltransferase TB.seq 3644897:3645973 MW:37840 
>emb|AL123456|MTBH37RV:c3645973-3644894, rmlA2 SEQ ID NO:123 

TTGGCAACTCACCAAGTCGATGCGGTGGTCCTGGTCGGT6GCAAGGGTACCCGACTGCGGCCG 

5 TTGACGCTGTCGGCGCCCAAGCCAATGCTGCCTACCGCCGGACTGCCGTTCCTCACCCATCTG 
CTGTCGCGGATCGCCGCAGCGGGCATCGAGCACGTGATCCTGGGTACGTCCTACAAACCCGCA 
GTCTTCGAAGCGGAGTTCGGCGACGGGTCCGCAGTGGGCCTACAGATCGAATACGTGACCGAG 
GAGCATCCCTTGGGGACTGGCGGCGGCATCGCCAACGTTGCCGGCAAGCTGCGCAACGACAC 
CGCGATGGTGTTTAACGGCGATGTGCTCTCGGGCGCGGATCTGGCCCAACTGCTGGACTTCCA 

10 CCGAAGCAATCGAGCCGATGTCACGCTGCAACTGGTGCGGGTGGGCGACCCGCGGGCATTCG 
GCTGCGTACCCACCGACGAGGAGGACCGCGTAGTCGCCTTTCTGGAGAAGACGGAGGATCCG 
CCGACCGACCAGATCAATGCCGGCTGCTATGTCTTCGAACGCAACGTCATCGACCGGATTCCGC 
AGGGCCGGGAGGTTTCGGTGGAACGCGAGGTGTTCCCGGCCTTGCTCGCCGACGGCGACTGC 
AAGATCTACGGCTATGTCGATGCCAGCTATTGGCGGGACATGGGCACACCGGAA6AGTTCGTTC 

15 GCGGATCGGC6GATCTGGTGCGCGGCATCGCCCCGTCTCCGGCCTTGCGTGGTCACCGCGGT 
GAGCAGTTGGTGCACGACGGTGCGGCGGTATCTCCCGGTGCGTTGCTGATTGGCGGCACCGTC 
GTGGGGCGTGGTGCCGAAATCGGCCCCGGCACCAGATTGGACGGCGCGGTCATCTTCGATGG 
TGTCCGGGTGGAGGCCGGGTGCGTGATCGAGCGTTCGATCATCGGCTTCGGTGCTCGCATCGG 
ACCGCGGGCGTTGATCCGCGACGGTGTGATCGGTGACGGGGCCGACATCGGCGCGCGCTGCG 

20 AGTTGTTAAGTGGTGCCCGGGTATGGCCCGGTGTCTTTCTTCCCGACGGCGGGATCCGTTACTC 
GTCCGACGTTTGA 

>Rv3368c - TB.seq 3780334:3780975 MW:23734 >emb|AL123456|MTBH37RV:c3780975-3780331, 
Rv3368c SEQIDNO:124 

25 ATGACCCTCAACCTGTCCGTCGACGAGGTCCTGACCACTACCCGCTCGGTGCGCAAGCGTCTC 
GATTTCGACAAGCCGGTGCCACGCGACGTGCTGATGGAATGCCTCGAGCTGGCGCTGCAGGCG 
CCCACCGGTTCCAATTCCCAAGGCTGGCAGTGGGTGTTCGTCGAGGACGCCGCCAAGAAAAAG 
GCGATCGCCGACGTCTACCTGGCCAACGCCCGGGGCTACCTCAGCGGGCCGGCGCCCGAGTA 
CCCCGACGGCGACACCCGCGGCGAGCGGATGGGGCGGGTCCGCGATTCGGCGACCTATCTCG 

30 CCGAACACATGCACCGGGCGCCGGTGCTGCTGATCCCCTGCCTGAAAGGCCGGGAAGACGAG 
TCGGCGGTGGGTGGCGTGTCGTT7TGGGCCTCACTGTTCCCGGCGGTGTGGAGCTTCTGCCTG 
GCGCTGCGCTCCCGCGGGCTGGGTTCGTGCTGGACGACGCTGCACCTGCTCGACAACGGCGA 
GCACAAGGTGGCCGACGTGCTCGGCATTCCCTACGACGAATACAGGCAAGGCGGGCTGCTTCC 
GATCGCCTACACACAAGGCATCGACTTCCGGCCGGCCAAGCGGCTGCCGGCCGAGAGCGTGA 

35 CGCACTGGAACGGCTGGTAA 
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>Rv3382clytB1 TB.seq 3796447:3797433 MW:34667 >emb|AL123456|MTBH37RV:c3797433- 
3796444, lytB SEQIDNO:125 

ATGGGTeAGGTQTTCGTGGGACCGGTCGCACAGGGATACGCTTCGGGTGAAGTCACGGTGCTG 
TTGGCGTCGCCGCGGTCGTTTTGCGCCGGTGTAGAGCGTGCTATCGAGACGGTCAAGCGAGTG 
5 CTTGACGTGGCCGAAGGCCCGGTGTATGTGCGCAAGCAAATCGTGCACAACACTGTTGTGGTT 
GCCGAGTTGCGGGACCGGGGAGCAGTGTTCGTCGAGGATCTCGACGAGATTCCCGATCCGCC 
GCCGCCGGGGGCGGTCGTGGTGTTCTCCGCGCATGGGGTTTCCCCGGCGGTGCGCGCGGGC 
GCTGATGAGCGGGGACTGCAGGTCGTCGACGCGACCTGCCCACTGGTGGCGAAAGTCCACGC 
TGAAGCCGCACGGTTTGCCGCGGGCGGTGACACGGTGGTCTTCATCGGGCACGCCGGACATG 

10 AGGAGACCGAAGGCACGCTTGGCGTCGCTCCGCGGTCAACATTATTGGTGCAGACACCCGCTG 
ATGTGGCAGCGTTGAACCTGCCCGAGGGTACCCAGCTATCGTATCTGACCCAGACAACCCTGG 
CACTTGATGAAACTGCCGATGTCATTGATGCGCTGCGCGCGAGGTTTCCGACGTTGGGCCAACC 
CCCCTCTGAAGACATCTGCTATGCCACCACGMCAGACAGCGTGCGCTGCAATCGATGGTCGGT 
GAATGTGACGTTGTGTTGGTGATTGGCTCGTGCAATTCGTCGAATTCGCGGCGTCTGGTCGAGT 

1 5 TGGCGCAGCGAAGTGGGACGCCGGCCTACTTGATTGACGGGCCTGATGACATTGAGCCCGAAT 
GGCTGTCGTCGGTCTCGACGATCGGTGTCACCGCGGGAGCCTCCGCGCCGCCACGACTGGTG 
GGGCAGGTGATTGATGCACTTCGCGGATACGCCTCGATCACCGTGGTGGAACGCTCGATAGCG 
ACCGAGACGGTGCGATTCGGCCTTCCCAAACAGGTTCGCGCGCAATGA 

20 >Rv3418cgroES 10 kD chaperone TB.seq 3836985:3837284 MW:10773 
>emb|AL123456|MTBH37RV:c3837284-3836982. groES SEQ ID NO:126 

GTGGCGAAGGTGAACATCAAGCCACTCGAGGACAAGATTCTCGTGCAGGCCAACGAGGCCGAG 
ACCACGACCGCGTCCGGTCTGGTCATTCCTGACACCGCCAAGGAGAAGCCGCAGGAGGGCAC 
CGTCGTTGCCGTCGGCCCTGGCCGGTGGGACGAGGACGGCGAGAAGCGGATCCCGCTGGACG 
25 TTGCGGAGGGTGACACCGTCATCTACAGCAAGTACGGCGGCACCGAGATCAAGTACAACGGCG 
AGGAATACCTGATCCTGTCGGCACGCGACGTGCTGGCCGTCGTTTCCAAGTAG 

>Rv3423c air TB.seq 3840193:3841416 MW:43357 
>emb|AL123456|MTBH37RV:c3841416-3840190, air SEQ ID NO:127 

30 GTGAAACGGTTCTGGGAGAATGTCGGAAAGCCAAACGACACGACAGATGGGCGGGGCACGACT 
TCGTTGGCCATGACACCGATATCCCAGACACCTGGCGTCCTCGCCGAGGCCATGGTGGATCTG 
GGCGCTATTGAACACAACGTGCGGGTGCTGCGTGAGCACGCCGGCCACGCGCAGCTGATGGC 
GGTGGTCAAGGCCGACGGCTACGGTCACGGTGCTACGCGCGTCGCCCAAACCGCCCTGGGAG 
CCGGTGCGGCCGAACTCGGCGTCGCCACCGTCGACGAGGCGCTAGCGCTGCGCGCTGATGGC 

35 ATTACCGCACCGGTGCTGGCCTGGCTGCATCCGCCCGGCATCGACTTCGGGCCCGCGCTGCTG 
GCCGACGTGCAGGTCGCGGTGTCCTCGCTGCGCCAACTCGACGAACTGTTGCACGCGGTGCG 
CCGGACCGGCCGGACGGCGAC6GTGACCGTCAAGGTGGATACCGGGCTGAACCGCAATGGCG 



WO 01/35317 



PCTYUS00/31152 



TGGGACCGGCACAATTCCCGGCCATGCTGACCGCGTTACGCCAAGCCATGQCCGAGGACGCC 
GTCCGGCTGCGGGGGCTGATGTCGCATATGGTTTACGCCGACAAGCCTGACGATTCCATCAAC 
GATGTTCAGGCCCMCGGTTTACCGCCTTTCTGGCGCAGGCCCGCGAACAAGGGGTGCGGTTC 
GAGGTGGCGCATCTATCGAACTCATCAGCAACTATGGCGCGCCCCGACCTGACGTTCGACCTG 

5 GTGCGGCCGGGCATCGCGGTGTATGGGCTAAGCCCGGTACCCGCCCTCGGTGACATGGGGCT 
GGTGCCGGCGATGACCGTGAAATGTGCTGTTGCGCTGGTGAAATCGATTCGTGCGGGGGAGGG 
CGTGTCGTATGGGCACACATGGATCGCGCCACGCGACACCAATCTGGCGCTGCTGCCGATCGG 
TTACGCAGACGGCGTGTTCCGGTCGCTGGGCGGGCGGCTGGAGGTGCTGATCAACGGCAGAC 
GATGCCCCGGTGTGGGGCGGATCTGCATGGACCAGTTCATGGTCGACCTGGGCCCCGGGCCG 

10 CTTGATGTGGCCGAAGGCGACGAGGCGATTTTGTTGGGGCCGGGCATCCGGGGTGAGCCCAG 
GGCTCAGGAGTGGGCCGATCTTGTCGGCACCATCCACTACGAAGTGGTCACCAGCCCGCGAGG 
ACGTATCACCAGGACCTATCGCGAGGCTGAAAACCGTTGA 

>Rv3490 otsA [alphajrtrehalose-phosphate synthase TB.seq 3908232:3909731 MW:55864 

15 >emb|AL123456|MTBH37RV:3908232-3909734, OtsA SEQ ID NO:128 

ATGGCTCGCTCGGGAGGCGAGGAGGCGCAGATTTGCGATTCGGAGACCTTCGGGGAGTCTGAC 
TTCGTGGTGGTAGCCAATCGACTGCCCGTCGATCTGGAGCGTCTTCCCGACGGCAGCACAACC 
TGGAAACGCAGCCCCGGAGGCTTGGTCACCGCCTTGGAGCCGGTGCTGCGGCGTCGGCGCGG 
GGCCTGGGTCGGCTGGCCCGGCGTTAACGACGACGGGGCCGAACCCGACCTCCACGTGCTGG 

20 ACGGCGCCATCATCCAAGACGAGCTGGAACTTCATCCGGTACGGCTGAGCAGCACGGACATAG 
CTCAGTACTACGAGGGATTCTCCAACGCCACACTGTGGCCGCTGTACCACGACGTCATCGTCAA 
GCCGCTCTACCACCGCGAAT6GTGGGATCGCTACGTCGACGTCAACCAGCGCTTTGCCGAGGC 
CGCGTCGCGCGCCGCCGCCCACGGCGCAACCGTGTGGGTACAGGACTACCAGCTGCAGCTGG 
TACCGAAGATGCTGCGCATGCTGCGGCCCGATCTGACCATCGGTTTCTTTTTGGACATCCCGTT 

25 CCCGCCGGTAGAGCTGTTTATGCAGATGCCGTGGCGCACCGAGATCATCCAGGGCCTACTGGG 
CGCCGACCTGGTGGGCTTCCATCTTCCGGGCGGTGCCCAGAATTTCCTGATCCTGTCCCGGCG 
TCTGGTCGGCACCGACACTTCCCGCGGAACCGTCGGTGTGCGGTCGCGGTTCGGTGCGGCGG 
TGCTCGGGTCCCGCACCATACGAGTTGGCGCCTTTCCTATCTCGGTTGACTCCGGCGCGCTCG 
ACCACGCTGCCCGCGACCGCAACATCAGGCGCCGGGCCCGCGAGATTCGCACCGAACTGGGA 

30 AATCCGCGCAAGATCCTGCTCGGTGTTGACCGGCTCGACTACACCAAGGGCATCGACGTACGG 
CTGAAGGCCTTTTCCGAGCTGCTGGCCGAGGGCCGCGTCAAAGGCGACGACACCGTCGTGGTC 
CAGCTGGCTACCCCGAGCCGCGAGCGGGTGGAGAGCTAGCAGACGCTGCGCAACGACATCGA 
ACGCCAGGTCGGCCACATTAACGGCGAGTACGGTGAGGTTGGCCATCCGGTAGTGCATTACCT 
GCATCGACCGGCTCCGCGCGACGAGCTTATCGCTTTCTTCGTGGCCAGCGACGTCATGCTGGT 

35 CACCCCACTACGCGACGGGATGAACCTGGTGGCCAAGGAGTACGTCGCTTGCCGCAGCGATCT 
TGGCGGTGCCCTGGTGCTCAGCGAATTCACGGGGGCCGCAGCCGAACTCCGGCACGCATACCT 
GGTCAACCCGCACGACCTGGAAGGGGTCAAGGACGGGATAGAGGAAGCGCTCAACCAGACGG 
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AGGAGGCGGGCCGGCGGCGAATGCGQTCGCTGCGACGCCAAGTGCTCGCCCACGACGTGGA 
CCGCTGGGCACAGTCGTTTCTCGACGCTCTCGCCGGGGCACACCCGAGGGGCCAAGGCTAA 

>Rv3598c lysS lysyl-tRNA synthase TB.seq 4041423:4042937 MW:55678 

5 >emb|AL123456|MTBH37RV:c4042937-4041420, lysS SEQ ID NO:129 

GTGAGTGCCGCTGACACAGCAGAAGACCTTCCTGAGCAGTTCCGGATTCGCCGGGACAAGCGC 
GCTCGCTTGCTGGCCCAGGGGCGCGATCCCTATCCCGTCGCGGTGCCGCGCACTCACACGTTG 
GCCGAGGTTCGCGCCGCCCACCCTGACTTGCCGATCGATACCGCGACCGAAGACATCGTCGGC 
GTCGCGGGCCGAGTGATCTTTGCGCGCAACTCGGGAAAGCTATGCTTTGCGACACTTCAGGAC 

10 GGCGATGGTACCCAGCTGCAAGTGATGATCAGCCTCGACAAGGTCGGCCAGGCTGCTCTCGAC 
GCATGGAAAGCCGATGTCGACCTGGGCGACATCGTCTACGTGCATGGCGCGGTGATCAGTTCG 
CGGCGCGGCGAGCTGTCCGTCCTGGCGGATTGCTGGCGGATCGCCGCCAAGTCGCTGCGGCC 
GCTTCCCGTGGCGCACAAAGAGATGAGTGAAGAGTCGCGGGTTCGTCAGCGCTATGTTGACCT 
CATAGTTCGACCGGAAGCGCGCGCGGTGGCTCGACTACGGATCGCCGTCGTCCGCGCGATCC 

16 GGACGGCGCTTCAACGTCGTGGGTTCCTGGAAGTCGAGACGCCCGTCTTGCAGACGTTAGCCG 
GTGGTGCGGGGGCCCGTCCGTTCGCCACTCATTCCAATGCCCTAGACATCGATCTGTACCTGCG 
GATCGCGCCGGAACTGTTCCTCAAGCGCTGCATCGTGGGTGGTTTCGACAAGGTCTTCGAACTT 
AATCGAGTGTTCCGAAACGAAGGAGCCGATTCGACGCATTGTCCGGAATTCTCCATGCTGGAGA 
CCTACCAGACCTACGGAACCTATGACGATTCGGCAGTCGTCACCCGGGAGCTTATTCAAGAGGT 

20 GGCCGATGAGGCGATCGGAACCAGACAACTGCCGTTGCCCGACGGCAGTGTCTATGACATCGA 
CGGAGAATGGGCGACTATACAAATGTACCCGTCGCTGTCTGTGGCGCTCGGTGAAGAGATCAC 
ACCGCAGACGACGGTCGATCGCTTACGTGGGATCGCCGATAGCCTTGGCCTGGAGAAAGACCC 
AGCGATTCATGACAACCGTGGCTTCGGCCACGGCAAACTCATCGAGGAACTCTGGGAGCGCAC 
AGTGGGGAAGAGCTTGAGCGCAGGCACATTTGTCAAGGATTTTCCGGTTCAGAGAACGGCTTTG 

25 ACCCGTCAGCACCGCAGTATCCCCGGCGTAACCGAGAAGTGGGACCTCTATCTGCGCGGAATC 
GAACTTGCCACCGGCTACTCGGAATTAAGCGACCCGGTAGTGCAGCGGGAGAGATTCGCCGAC 
CAGGCCCGTGCCGCGGCCGCTGGCGATGACGMGCGATGGTGCTTGACGAGGATTTTCTGGCC 
GCTCTGGAGTACGGCATGCCACCGTGCACCGGAACCGGAATGGGTATCGATCGGTTGTTGATG 
TCTTTGACTGGGTTGTCMTTAGGGAGACAGTTTTGTTCCCGATTGTTCGACCACACTCCAACTG 

30 A 

>Rv3600c - similar to Bacillus subtilis protein YacB TB.seq 4043041 :4043856 MW:29274 
>emb|AL123456|MTBH37RV:o4043856-4043038 l Rv3600c SEQ ID NO:130 
GTGCTGCTGGCGATTGACGTCCGCAACACCCACACCGTTGTGGGCCTGCTGTCCGGAATGAAA 
35 GAGCACGCAAAGGTCGTGCAGCAGTGGCGGATACGCACCGAATCCGAAGTCACCGCCGACGAA 
CTGGCACTGACGATCGACGGGCTGATCGGCGAGGATTCCGAGCGGCTCACCGGTACCGCCGC 
CTTGTCCACGGTCCCGTCCGTGCTGCACGAGGTGCGGATAATGCTCGACCAGTACTGGCCGTC 
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QGTGCCGCACGTGCTGATCGAGCCCGGAGTACGCACCGGGATCCCTTTGCTCGTCGACAACCC 
GAAGGAAGTGGGCGCAGACCGCATCGTGAACTGTTTGGCCGCCTATGACCGGTTCCGGAAGGC 
CGCCATCGTCGTTGACTTTGGATCCTCGATCTGTGTTGATGTTGTATCGGCCMGGGTGMTTTC 
TTGGCGGCGCCATCGCGGCCGGGGTGCAGGTGTCTTCCGATGCCGCGGCGGCCCGCTCGGCG 

5 GCATTGCGCCGCGTTGAACTTGCCCGCCCACGTTCGGTGGTTGGCAAGAACACCGTCGAATGC 
ATGCAAGCCGGTGCGGTGTTCGGCTTCGCCGGGCTGGTAGACGGGTTGGTAGGCCGCATCCG 
CGAGGACGTGTCCGGTTTGTCCGTCGACCACGATGTCGCGATCGTGGCTACCGGGCATACCGC 
GCCCCTGCTGCTGCCGGAATTGCACACCGTCGACCATTACGACCAGCACCTGACCTTGCAGGG 
TCTGCGGCTGGTGTTCGAGCGTAACCTCGAAGTCCAGCGCGGCCGGCTCAAGACGGCGCGCT 

10 GA 

>Rv3606c folK 7,8-dihydrt>6-hydroxymethylpterin pyrophosphokinase TB.seq 
4048181:4048744 MW:20732 >emb|AL123456|MTBH37RV:c4048744-4048178, folK 
SEQ ID NO:131 

15 ATGACGCGGGTAGTGCTCTCGGTTGGCTCCAACCTGGGTGACCGCCTGGCACGATTGCGGTCG 
GTCGCCGACGGTCTCGGCGATGCGTTGATTGCGGCTTCCCCGATATATGAGGCCGACCCCTGG 
GGTGGGGTGGAGCAGGGGCAGTTCCTCAATGCGGTGCTGATCGCCGACGATCCTACCTGCGAA 
CCGCGGGAGTGGCTGCGGCGGGCGCAGGAGTTCGAGCGCGCTGCGGGCAGGGTGCGTGGCC 
AGCGCTGGGGTCCACGAAATCTCGACGTCGACCTGATCGCCTGCTACCAGACCTCGGCCACCG 

20 AGGCTCTGGTCGAAGTGACCGCGCGGGAGAACCACCTCACGCTGCCGCACCCACTGGCGCAT 
CTGCGGGCCTTTGTGTTGATCCCGTGGATTGCCGTCGACCCAACGGCGCAGCTGACGGTTGCC 
GGGTGCCCGCGGCCCGTCACGCGACTGCTGGCCGAGCTGGAGCCCGCCGACCGCGACAGTGT 
GCGGTTGTTTAGGCCGTCGTTCGATCTGAATAGCAGACACCCCGTCAGTCGGGCACGGGAAAG 
CTGA 

25 

>Rv3607c foIX may be involved in folate biosynthesis TB.seq 4048744:4049142 MW:14553 
>emb|AL123456|MTBH37RV:c4049142-4048741 , foIX SEQ ID NO:132 

ATGGCTGACCGAATCGAACTGCGCGGCCTGACCGTGCATGGTCGGCACGGGGTCTACGACCAC 
GAGCGAGTGGCCGGGCAGCGGTTTGTCATCGATGTCACCGTGTGGATAGACCTGGCCGAGGC 
30 CGCCAACAGCGACGACTTGGCCGACACCTATGACTACGTGCGGCTGGCTTCGCGGGCGGCCG 
AGATCGTCGCCGGACCCCCGCGGAAGCTGATCGAAACGGTCGGGGCCGAGATCGCTGATCAC 
GTGATGGACGACCAGCGAGTGCATGCCGTTGAGGTGGCGGTACACAAGCCGCAGGCGCCCATT 
CCGCAGACGTTCGACGATGTGGCGGTGGTGATCCGACGCTCACGGCGCGGCGGCCGCGGTTG 
GGTAGTCCCGGCGGGCGGCGCGGTATGA 

35 

>Rv3608c folP dihydropteroate synthase TB.seq 40491 38:4049977 MW:2881 2 
>emb|AL123456|MTBH37RV:o4049977-4049135, folP SEQ ID NO:133 
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GTGAGTCCGGCGCCCGTGCAGGTGATGGGGGTTCTAAACGTCACGGACGACTCTTTCTCGGAC 

GGCGGGTGTTATCTCGATCTCGACGATGCGGTGAAGCACGGTCTGGCGATGGCAGCCGCAGGT 

GCGGGCATCGTCGACGTCGGTGGTGAGTCGAGCCGGCCCGGTGCCACTCGGGTTGACCCGGC 

GGTGGAGACGTCTCGTGTCATACCCGTCGTCAAAGAGCTTGCAGCACAAGGCATCACCGTCAG 

CATCGATACCATGCGCGCGGATGTCGCTCGGGCGGCGTTGCAGAACGGTGCCCAGATGGTCAA 

CGACGTGTCGGGTGGGCGGGCCGATCCGGCGATGGGGCCGCTGTTGGCCGAGGCCGATGTG 

CCGTGGGTGTTGATGCACTGGCGGGCGGTATCGGCCGATACCCCGCATGTGCCTGTGCGCTAC 

GGCAACGTGGTGGCCGAGGTCCGTGCCGACCTGCTGGCCAGCGTCGCCGACGCGGTGGCCGC 

AGGCGTCGACCCGGCAAGGCTGGTGCTCGATCCCGGGCTTGGATTCGCCAAGACGGCGCAAC 

ATAATTGGGCGATCTTGCATGCCCTTCCGGAACTGGTCGCGACCGGAATCCCAGTGCTGGTGG 

GTGCTTCGCGCAAGCGCTTCCTCGGTGCGTTGTTGGCCGGGCCCGACGGCGTGATGCGGCCA 

ACCGATGGGCGTGACACCGCGACGGCGGTGATTTCCGCGCTGGCCGCACTGCACGGGGCCTG 

GGGTGTGCGGGTGCATGATGTGCGGGCCTCGGTCGATGCCATCAAGGTGGTCGAAGCGTGGAT 

GGGAGCGGAAAGGATAGAACGCGATGGCTGA 

>Rv3609c fblE GTP cyclohydrolase ! TB.seq 4049977:4050582 MW:22395 
>emb|AL123456|MTBH37RV:c4050582-4049974, folE SEQ ID NO:134 

ATGTCGCAGCTGGATTCGCGCAGCGCATCTGCTCGTATCCGTGTGTTCGACCAGCAACGTGCC 

GAGGCCGCGGTGCGCGAATTGCTGTACGCGATCGGCGAGGATCCGGATAGGGACGGCTTGGT 

AGCCACCCCGTCCCGGGTTGCCCGGTCATACCGCGAAATGTTCGGCGGGCTCTACACCGACCC 

CGACTCGGTGTTGAACACCATGTTCGACGAAGACCACGACGAGCTGGTGTTGGTCAAGGAAATC 

CCTATGTACTCCACCTGCGAACACCACCTGGTGGCGTTCCACGGTGTGGCCCACGTCGGCTAC 

ATCCCGGGCGACGACGGCAGGGTGACCGGCTTGTCAAAGATCGCGCGACTGGTCGATCTGTAC 

GCCAAGCGAGCTCAGGTCCAGGAGCGGCTCACCAGTCAGATCGCCGATGCCCTGATGAAAAAA 

CTCGATCCACGCGGGGTAATCGTGGTGATCGAGGCTGAGCATCTGTGCATGGCGATGCGCGGG 

GTTCGCAAGCCCGGCTCGGTCACCACTACGTCGGCGGTGCGCGGACTGTTCAAAACCAATGCC 

GCTTCTCGAGCCGAAGCGCTCGACCTCATTTTGCGGAAGTGA 

>Rv3610c fteH inner membrane protein, chaperone TB.seq 4050601:4052880 MW:81987 
>emb|AL123456|MTBH37RV:c4052880-4050598, ftsH SEQ ID NO:135 

ATGAACCGGAAAAACGTGACTCGCACCATAACAGCGATCGCCGTCGTGGTGCTGCTCGGCTGG 

TCGTTCTTTTACTTCAGCGACGACACCCGCGGCTACAAGCCCGTTGATACCTCGGTGGCGATAA 

CACAGATCAACGGCGACAACGTCAAGAGCGCACAGATCGACGATCGCGAGCAACAGCTGCGGC 

TGATCCTGAAGAAGGGTAACAACGAGACCGACGGGTCCGAGAAGGTCATCACCAAGTACCCCA 

CCGGGTACGCCGTCGACCTGTTCAACGCGCTCAGCGCCAAAAACGCGAAGGTCAGCACGGTCG 

TCAACCAGGGCAGCATCCTGGGCGAGCTGCTGGTCTACGTGCTGCCGCTGCTGTTGCTGGTGG 

GGCTGTTCGTGATGTTCTCCCGCATGCAAGGCGGCGCCCGGATGGGCTTCGGGTTCGGCAAGT 
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CACGCGCCAAGCAACTGAGCAAGGACATGCCCAAGACCACCTTCGCCGAC6TCGCAGGTGTCG 
ACGAGGCGGTCGAGGAGCTCTACGAGATCAAGGACTTCCTGCAGAACCCCAGCAGGTACCAAG 
CGCTGGGCGCCAAGATCCCCAAAGGCGTGCTGCTCTACGGGCCGCCGGGAACCGGTAAGACG 
TTGCTGGCTCGTGCGGTGGCCGGCGAAGCCGGAGTGCCGTTCTTCACCATCTCCGGCTCCGAC 

5 TTCGTGGAAATGTTCGTCGGCGTCGGCGCATCCCGTGTCAGAGACCTGTTCGAGCAGGCCAAG 
CAGAACAGCCCGTGCATCATCTTCGTCGACGAGATCGACGCCGTCGGGCGACAAAGAGGCGCC 
GGGCTGGGCGGCGGTCACGACGAGCGTGAGCAGACCCTCAACCAGTTGCTAGTCGAAATGGA 
CGGTTTTGGCGATCGCGCCGGCGTCATCCTGATCGCGGCCACCAACCGGCCCGACATCCTGGA 
CCCGGCGCTGTTGCGGCCGGGCCGCTTCGACCGCCAGATCCCGGTATCCAACCCCGATCTGG 

10 CGGGTCGGCGGGCGGTGCTGCGCGTGCACTCGAAGGGCMGCCGATGGCCGCGGACGCCGA 
CCTCGACGGACTGGCCAAGCGGACCGTCGGCATGACCGGAGCCGACCTGGCCAACGTCATCA 
ACGAGGCGGCGCTGCTGACCGCCCGGGAGAACGGCACCGTCATCACCGGTCCCGCCCTCGAG 
GAAGCGGTGGACCGGGTGATCGGGGGCCCGCGCCGCAAAGGCCGGATCATCAGCGAGCAGGA 
GAAGAAGATCACCGCCTATCACGAGGGCGGGCACACCCTGGCCGCTTGGGCGATGCCCGATAT 

15 CGAGCCGATTTATAAGGTGACGATCCTGGCGCGCGGGCGTACCGGCGGGCACGCGGTGGCGG 
TGCCGGAAGAAGACAAGGGCCTGCGGACCCGCTCGGAAATGATCGCGCAACTGGTGTTCGCGA 
TGGGTGGGCGCGCCGCCGAAGAACTGGTGTTTCGTGAGCCGACCACCGGCGCGGTGTCCGAC 
ATCGAGCAGGCCACCAAGATAGCGCGCTCAATGGTCACCGAATTTGGAATGAGCTCCAAGCTG 
GGCGCGGTCAAATACGGCTCCGAACACGGCGACCCG7TCCTCGGACGTACCATGGGCACCCAG 

20 CCGGACTACTCCCACGAGGTCGCCCGCGAGATCGACGAAGAGGTCCGCAAGCTTATCGAGGCG 
GCGCATACCGAAGCGTGGGAAATCCTGACCGAATACCGCGACGTGCTGGACACTTTGGCCGGC 
GAGCTGCTGGAAAAGGAGACGCTGCACCGACCCGAGCTGGAAAGCATCTTCGCTGACGTCGAA 
AAGCGGCCGCGGCTCACCATGTTCGACGACTTCGGTGGCCGGATCCCGTCGGACAAACCGCCC 
ATCAAGACACCCGGCGAGCTCGCGATCGAACGCGGCGAACCTTGGCCCCAGCCGGTCCCCGA 

25 GCCGGCGTTCAAGGCGGCGATTGCGCAGGCTACCCAAGCCGCTGAGGCCGCCCGGTCCGACG 
CCGGCCAAACCGGGCACGGCGCCAACGGTTCGCCCGCCGGCACCCACCGGTCC6GTGACCGC 
CAGTACGGCTCCACCCAGCCTGACTACGGTGCCCCGGCGGGCTGGCATGCGCCGGGATGGCC 
CCCAAGGTCATCTCATCGGCCCAGCTATAGCGGTGAACCGGCACCGACGTATCCGGGTCAGCC 
CTACCCGACCGGTCAAGCCGATCGGGGTTCCGATGAGTCCTCGGCGGAGCAGGATGACGAGGT 

30 CAGTCGGACCAAGCCGGCCCACGGCTGA 

>Rv3671c - TB.seq 4112322:4113512 MW:40722 >emb|AL123456|MTBH37RV:c41 13512^11 12319, 
Rv3671c SEQIDNO:136 

ATGACCCCGTCGCAGTGGCTGGATATCGCCGTCTTGGCGGTCGCATTTATTGCAGCCATCTCCG 
35 GCTGGCGTGCCGGTGCGCTGGGCTCAATGCTGTCGTTTGGCGGGGTGCTGCTGGGCGCGACA 
GCCGGCGTGCTGCTGGCGCCGCATATCGTCAGTCAAATCAGCGCTCCGCGGGCCAAACTGTTT 
GCCGCGCTGTTCCTGATCCTGGCACTGGTCGTAGTCGGCGAGGTCGCTGGTGTGGTGCTGGGC 
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CGCGCCGTCCGCGGGGCGATCCGTAACCGGCCGATCCGGTTGATCGACTCGGTCATTGGGGTA 

GGGGTGCAGCTGGTCGTGGTGCTCACCGCGGCGTGGTTGTTGGCGATGCCGCTGACACAGTC 

GAAAGAGCAGCCCGAGCTGGCTGCCGCGGTGAAGGGTTCGCGGGTGCTCGCCCGGGTCAACG 

AGGCGGCACCCACCTGGCTGAAGACGGTGCCCAAGCGGCTGTCGGCCCTGCTGAACACCTCC 

GGCCTGCCCGCGGTTTTGGAGCCGTTCAGCCGCACGCCGGTCATTCCAGTGGCCTCACCCGAC 

CCAGCGCTGGTCAACAATCCGGTGGTGGCGGCCACCGAGCCAAGTGTCGTCAAAATCCGCAGC 

CTGGCACCCAGATGCCAGAAAGTGTTGGAGGGCACCGGCTTCGTGATCTCACCCGATCGGGTG 

ATGACCAACGCGCACGTGGTGGCCGGATCCAACAACGTCACGGTGTATGCCGGCGACAAGCCC 

TTCGAGGCCACGGTGGTGTCCTACGACCCGTCGGTCGACGTAGCGATCCTGGCCGTTCCGCAC 

TTGCCGCCGCCGCCGCTGGTCTTCGCTGCGGAGCCGGCGAAAACCGGTGCCGACGTTGTGGT 

GCTGGGTTATCCCGGCGGCGGCMTTTCACTGCCACACCCGCCAGGATTCGCGAGGCCATCAG 

ACTCAGTGGCCCCGATATTTAGGGGGACCCGGAGCCGGTTACCCGCGACGTGTACACCATCAG 

AGCCGATGTGGAGCAAGGTGATTCGGGTGGGCCCCTGATCGACCTCAACGGTCAGGTGCTCGG 

TGTGGTGTTCGGCGCAGCCATCGACGACGCCGAAACTGGGTTTGTGCTGACGGCCGGCGAGGT 

GGCGGGGCAGCTTGCCAAAATCGGTGCTACCCAACCGGTCGGCACCGGGGCCTGCGTCAGCT 

GA 

>Rv3682 ponA2 TB.seq 4121913:4124342 MW:84637 
>emb|AL123456|MTBH37RV:4121913-4124345. ponA 1 SEQ ID NO: 137 

ATGCCCGAGCGCCTCCCGGCCGCGATCACCGTTCTGAAGCTGGCTGGGTGCTGTCTGTTGGCC 

AGTGTCGTCGCCACTGCGCTGACGTTCCCGTTCGCAGGCGGGCTAGGGCTGATGTCCAATCGT 

GCCTCTGAGGTCGTTGCCAACGGCTCGGCCCAGCTGCTCGAGGGGCAAGTGCCTGCGGTATCG 

ACGATGGTCGACGCGAAGGGCAACACGATCGCGTGGCTGTACTCGCAGCGCCGGTTCGAGGT 

GCCCTCGGACAAGATCGCCAACACGAT6AAGCTGGCGATCGTCTCGATTGAAGATAAGCGGTTC 

GCCGACCACAGCGGCGTGGACTGGAAGGGCACCCTGACCGGCCTGGCGGGCTACGCGTCCG 

GCGACCTCGACACGCGCGGCGGCTCGACGCTCGAACAACAGTACGTGAAGAACTACCAACTGC 

TGGTGACAGCCCAAACCGATGCCGAGAAGCGAGCGGCCGTCGAAACCACTCCGGCCCGCAAG 

CTTCGCGAGATCCGGATGGCACTCACGCTGGACAAGAGCTTCACAAAATCTGAAATCCTGACCC 

GATACTTGAACCTGGTCTCGTTCGGCAATAACTCGTTCGGCGTGCAGGACGCGGCGCAAACGTA 

CTTCGGCATCAACGCGTCCGACCTGAATTGGCAGCAAGCGGCGCTGCTGGCCGGCATGGTGCA 

ATCGACCAGCACGCTCAACCCGTACACCAACCCCGACGGCGCGCTGGCCCGGCGGAACGTGG 

TCCTCGACACCATGATCGAGAACCTTCCCGGGGAGGCGGAGGCGTTGCGTGCCGCCAAGGCC 

GAGCCGCTGGGGGTACTGCCGCAGCCCAATGAGTTGCCGCGCGGCTGCATCGCGGCCGGCGA 

CCGCGCATTCTTCTGCGAGTACGTCCAGGAGTACCTGTCTCGGGCCGGGATCAGCAAGGAGCA 

GGTCGCCACGGGCGGGTACCTGATCCGCACCACCCTGGACCCAGAGGTGCAGGCACCGGTCA 

AGGCCGCCATCGACAAGTACGCCAGCCCGAACCTGGCCGGTATTTCCAGCGTGATGAGCGTGA 

TCAAACCGGGTAAGGATGCGCACAAGGTGTTGGCCATGGCCAGTAACCGCAAATACGGGCTGG 
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ATCTAGAAGCCGGCGAAACCATGCGGCCGCAGCCATTCTCCCTGGTTGGCGACGGCGCCGGGT 
CTATCTTCAAGATCTTCACCACGGCCGCTGCTCTGGACATGGGCATGGGTATTAACGCCCAACT 
CGACGTGCCGCCCCGATTCCAGGCCAAAGGTCTGGGAAGTGGCGGGGCAAAGGGGTGCCCCA 
AAGAGACCTGGTGTGTGGTGAACGCCGGCAACTACCGCGGCTCGATGAATGTCACCGACGCGC 
5 TGGCAACCTCGCCAAACACCGCGTTCGCCAAGCTGATCTCGCAGGTCGGGGTGGGGCGTGCG 
GTCGATATGGCCATCAAACTCGGGCTGAGGTCTTATGCGAATCCCGGCACCGCACGCGACTAG 
AACCCCGACAGCAATGAGAGCTTGGCTGACTTCGTCAAACGACAGAACCTGGGTTCGTTCACCC 
TCGGCCCCATCGAGTTAAACGCGCTGGAGCTGTCCAACGTGGCGGCCACGTTGGCATCCGGCG 
GCGTGTGGTGCCCCCCCAACCCAATCGACCAGCTCATCGACCGCAACGGCAACGAAGTCGCGG 
10 TCACCACCGAGACGTGCGACCAGGTGGTGCCCGCAGGGCTGGCGAACACCCTCGCCAACGCG 
ATGAGCAAGGACGCCGTGGGCAGCGGCACGGCGGCCGGTTCGGCCGGCGCGGCGGGCTGGG 
ATCTGCCGATGTCCGGCAAAACCGGCACCACCGAGGCGCACCGGTCGGCCGGCTTCGTGGGC 
TTCACCAACCGCTACGCGGCGGCGAACTACATCTACGACGACTCCAGCTCGCCGACAGATCTGT 
GTTCCGGCCCGCTGCGCCATTGCGGCAGCGGCGACTTGTACGGCGGCAACGAGCCATCCCGC 
16 ACCTGGTTCGCCGCGATGAAGCCGATCGCCAACAACTTCGGCGAAGTGCAGCTACCACCGACC 
GATCCACGCTATGTCGACGGCGCACCAGGCTCACGGGTACCAAGCGTGGCCGGTCTGGATGTC 
GACGCCGCACGCCAGCGCCTCAAGGACGCGGGCTTCCAGGTCGCCGACCAAACCAACTCGGT 
CAACAGCTCCGCCAAGTATGGTGAGGTGGTCGGAACGTCGCCCAGCGGTCAAACAATTCCGGG 
TTCGATCGTCACGATCCAGATCAGCAACGGCATCCCGCCGGCTCCGCCTCCGCCACCGCTGCC 
20 TGAGGATGGTGGGCCGCCACCGCCGGTCGGATCGCAGGTGGTGGAGATTCCGGGGCTGCCGC 
CGATCACCATTCCGCTGCTGGCGCCACCACCCCCAGCGCCTCCCCCGTAG 

>Rv3721c dnaZX DNA polymerase lll f [gamma] (dnaZ) and t (dnaX) TB.seq 4164995:4166728 
MW:61892 >emb|AL1 23456|MTBH37RV:c41 66728-41 64992, dnaZX SEQIDNO:138 
GTGGCTCTCTACCGCAAGTACCGACCGGCAAGCTTCGCGGAGGTGGTGGGGCAGGAGCACGT 
CACCGCGCCGCTGTCGGTGGCGCTGGATGCCGGCCGGATCAACCACGCGTACCTGTTCTCTGG 
GCCGCGTGGCTGCGGAAAGACGTCGTCAGCGCGTATCCTGGCGCGGTCGTTGAACTGTGCGCA 
GGGCCCTACCGCCAACCCGTGCGGGGTCTGCGAATCCTGCGTTTCGTTGGCGCCCAACGCCCC 
CGGCAGCATCGACGTGGTAGAGCTGGATGCCGCCAGCCACGGCGGCGTGGACGACACCCGCG 
AGCTGCGGGACCGCGCGTTCTATGCGCCGGTCCAGTCACGGTACCGGGTATTTATCGTCGACG 
AGGCGCACATGGTGACCACCGCGGGATTCAACGCGCTGCTCAAGATCGTGGAGGAACCGCCC 
GAACACCTGATCTTCATATTCGCCACCACCGAACCGGAGAAGGTACTGCCGACGATTCGGTCGC 
GCACTCATCACTACCCGTTCGGGCTGCTGCCGCCGCGCACTATGCGGGCGTTGCTCGCGCGGA 
TCTGCGAGCAGGAGGGCGTCGTCGTCGACGATGCGGTGTACCCGTTGGTGATCCGGGCCGGC 
GGAGGTTCCCCACGGGATACGCTCTCGGTGCTGGACCAATTGCTGGCTGGGGCCGCGGACAC 
CCACGTGACCTACACCCGGGCGCTGGGGCTGCTGGGTGTCACCGACGTCGCCCTGATCGACG 
ACGCGGTCGACGCACTGGCCGCTTGCGATGCGGCCGCATTGTTCGGGGCGATCGAATCGGTGA 



30 



146 



WO 01/35317 



PCT/US00/31152 



TCGATGGCGGACATGACCCTCGGCGTTTCGCTACCGATCTGCTGGAGCGATTCCGCGACCTGA 
TTGTGCTGCAATCGGTTCCCGACGCGGCATCTCGCGGGGTGGTGGATGCGCCCGAAGACGCG 
CTGGATCGGATGCGCGAGCAAGCCGCCCGGATCGGGCGGGCGACCCTGACCCGATATGCCGA 
GGTGGTGCAGGCCGGGCTAGGCGAGATGCGCGGTGCGACCGCGCCGCGTCTGCTGCTGGAA 
5 GTGGTTTGCGCGCGACTGCTGCTGCCCTCGGCGAGCGACGCCGAATCGGCACTGTTGCAGCG 
GGTCGAAGGGATCGAGACCCGGTTGGACATGTCGATCCCGGCGCCGCAAGCCGTACCACGCC 
CGTCGGCTGCGGCTGCCGAGCCGAAACACCAGCCCGCGCGTGAACCGAGACC6GTGCTGGCC 
CCCACACCGGCCTCGAGCGAACCCACCGTGGCCGCGGTTCGGTCCATGTGGCCGACGGTGCG 
CGACAAGGTGCGCCTGCGCAGCCGTACCACCGAGGTGATGCTGGCGGGTGCCACCGTCCGTG 

10 CGCTAGAGGACAACACGCTGGTGCTGACCCACGAATCGGCGCCGCTGGCGCGGCGGCTGTCC 
GAAGAGCGCAACGCCGATGTCCTCGCCGAGGCGCTTAAAGACGCGCTGGGAGTCAACTGGCG 
GGTGCGGTGTGAGACCGGTGAACCGGCTGCGGCGGCATCACCCGTCGGCGGGGGAGCGAAC 
GTGGCGACCGCCAAGGCCGTAAACCCTGCCCCCACAGCGAATTCCACTCAGCGCGACGAAGAG 
GAGCACATGCTCGCCGAAGCCGGCCGTGGCGACCCGTCGCCGCGTCGCGACCCGGAAGAGGT 

15 TGCACTCGAGCTGCTGCAGAACGAGCTGGGCGCGCGCCGGATAGACAACGCCTAG 

>Rv3783 - TB.seq 4229255:4230094 MW:32337 

>emb|AL123456|MTBH37RV:4229255-4230097 l Rv3783 SEQ ID NO: 139 

ATGACATTCATGGATGCTCAAGCTAGCTTCCAGACACAGTCGCGGACACTGGCCCGCGTCCGA 
20 GGCGATCTGGTCGACGGGTTCCGCCGCCACGAGGTGTGGCTGCACCTGGGCTGGCAGGACAT 
CAAGCAGCGGTACCGCCGCTCGGTGCTGGGGCCGTTCTGGATCACCATCGCCACCGGAACGA 
CCGCCGTCGCGATGGGCGGCCTGTATTCCAAGCTGTTTCGGCTCGAGCTGTCTGAGCACCTGC 
CCTACGTCACGCTCGGGCTGATCGTCTGGAACCTGATCAACGCCGCCATCCTGGACGGCGCAG 
AGGTTTTCGTCGCCAACGAAGGTCTGATCAAACAGCTGCCGGCACCGTTGAGCGTGCACGTCTA 
25 TCGGTTGGTGTGGCGGCAGATGATCTTCTTCGCCCACAACATCGTCATCTACTTCGTCATCGCG 
ATCATCTTTCCTAAGCCGTGGTCGTGGGCGGATCTGTCGTTTCTTCCGGCGCTGGCGCTCATTT 
TCCTCMTTGCGTTTGGGTGTCACTGTGTTTCGGCATCCTGGCGACCCGCTACCGCGACATCGG 
CCCGCTGCTGTTTTCCGTTGTGCAGTTGTTGTTCTTCATGACGGCGATCATCTGGAACGACGAGA 
CCCTGCGTCGGCAGGGCGCGGGCCGCTGGTCGAGCATCGTCGAGCTCAACCCGCTGCTGCAC 
30 TATCTGGACATCGTGCGGGCGCCACTGTTGGGCGCTCACCAGGAGCTGCGGCACTGGCTGGTG 
GTGCTGGTGTTGACCGTCGTCGGCTGGATGCTGGCGGCGTTCGCGATGCGGCAGTATCGCGC 
GCGGGTGCCCTACTGGGTGTAG 

>Rv3789 - TB.seq 4235371:4235733 MW:13378 
35 >emb|AL123456|MTBH37RV:4235371-4235736 f Rv3789 SEQ ID NO:140 

ATGCGGTTCGTTGTCACCGGCGGCCTCGCTGGGATAGTTGACTTTGGCCTCTACGTCGTGCTGT 
ACAAGGTGGCGGGCCTACAGGTCGACCTGTCCAAGGCCATCAGCTTCATCGTCGGCACCATCA 
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CCGCGTACCTGATCAACCGCCGGTGGACATTCCAGGCCGAGCCCAGCACGGCCCGATTCGTCG 
CGGTCATGCTCCTCTACGGAATCACCTTCGCCGTGCAGGTCGGACTCAACCACCTCTGCCTCGC 
ACTCTTGCACTACCGGGCGTGGGCCATCCCCGTCGCGTTTGTGATCGCGCAGGGCACCGCCAC 
GGTAATCAACTTCATCGTGCAGCGAGCCGTGATCTTCCGGATCCGCTGA 

5 

>Rv3790 - TB.seq 4235776:4237158 MW:50164 

>emb|AL123456|MTBH37RV:4235776^4237161, Rv3790 SEQ ID N0:141 

ATGTTGAGCGTGGGAGCTACCACTACCGCCACCCGGCTGACCGGGTGGGGCCGCACAGCGCC 
GTCGGTGGCGAATGTGCTTCGCACCCCAGATGCCGAGATGATCGTCAAGGCGGTGGCTCGGGT 

10 CGCCGAGTCGGGGGGCGGCCGGGGTGCTATCGCGCGCGGGCTGGGCCGCTCCTATGGGGAC 
AACGCCCAAAACGGCGGTGGGTTGGTGATCGACATGACGCCGCTGAACACTATCCACTCCATTG 
ACGCCGACACCAAGCTGGTCGACATCGACGCCGGGGTCAACCTCGACCAACTGATGAAAGCCG 
CCCTGCCGTTCGGGCTGTGGGTCCCGGTGCTGCCGGGAACCCGGCAGGTCACCGTCGGCGGG 
GCGATCGCCTGCGATATCCACGGCAAGAACCATCACAGCGCTGGCAGCTTCGGTAACCACGTG 

15 CGCAGCATGGACCTGCTGACCGCCGACGGCGAGATCCGTCATCTCACTCCGACCGGCGAGGA 
CGCCGAACTGTTCTGGGCCACCGTCGGGGGCAACGGTCTCACCGGCATCATCATGCGGGCCAC 
CATCGAGATGACGCCCACTTCGACGGCGTACTTCATCGCCGACGGCGACGTCACCGCCAGCCT 
CGACGAGACCATCGCCCTGCACAGCGACGGCAGCGAAGCGCGCTACACCTATTGCAGTGCCTG 
GTTCGACGCGATCAGCGCTCCCCCGAAGCTGGGCCGCGCGGCGGTATCGCGTGGCCGCCTGG 

20 CCACCGTCGAGCAATTGCCTGCGAAACTGCGGAGCGAACCTTTGAAATTCGATGCGCCACAGCT 
ACTTACGTTGCCCGACGTGTTTCCCAACGGGCTGGCCAACAAATATACCTTCGGCCCGATCGGC 
GAACTGTGGTACCGCAAATCCGGCACCTATCGCGGCAAGGTCCAGAACCTCACGCAGTTCTACC 
ATCCGCTGGACATGTTCGGCGAATGGAACCGCGCCTACGGCCCAGCGGGCTTCCTGCAATATC 
AGTTCGTGATCCCCACAGAGGCGGTTGATGAGTTCAAGAAGATCATCGGCGTTATTCAAGCCTC 

25 GGGTCACTACTCGTTTCTCAACGTGTTCAAGCTGTTCGGCCCCCGCAACCAGGCGCCGCTCAGC 
TTCCCCATCCCGGGCTGGAACATCTGCGTCGACTTCCCCATCAAGGACGGGCTGGGGAAGTTC 
GTCAGCGAACTCGACCGCCGGGTACTGGAATTCGGCGGCCGGCTCTACACCGCCAAAGACTCC 
CGTACCACGGCCGAAACCTTTCATGCCATGTATCCGCGCGTCGACGAATGGATCTCCGTGCGCG 
GCAAGGTCGATCCGCTGCGCGTATTCGCCTCCGACATGGCCCGACGCTTGGAGCTGCTGTAG 

30 

>Rv3791 -TB.seq 4237162:4237923 MW:27470 

>emb|AL123456|MTBH37RV:4237162-4237926, Rv3791 SEQ 10 NO:142 

ATGGTTCTTGATGCCGTAGGAAACCCCCAGACGGTGCTGCTGCTGGGTGGCACCTCCGAGATC 
GGGCTCGCCATCTGCGAGCGCTACCTGCACAATTCGGCGGCCCGCATCGTGCTGGCCTGCCTG 
35 CCCGACGACCCACGGCGGGAGGACGCGGCCGCTGCGATGAAGCAGGCCGGCGCGCGGTCGG 
TGGAGCTGATCGACTTTGACGCCCTGGATACCGACAGCCACCCGAAGATGATCGAGGCGGCCT 
TCTCCGGCGGTGATGTGGACGTGGCTATCGTCGCGTTCGGCTTGCTCGGCGACGCCGAAGAGC 
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TGTGGCAGAACCAGCGCAAGGCGGTGCAGATCGCCGAAATCAACTACACCGCAGCGGTTTCGG 

TGGGCGTGCTGCTGGCTGAGAAGATGCGCGCTCAGGGCTTCGGTCAGATCATCGCGATGAGCT 

CGGCCGCCGGTGAGCGGGTGCGACGGGCGMCTTCGTCTACGGCTCCACCAAGGCCGGTCTG 

GACGGGTTTTACCTGGGGTTGTCAGAAGCGCTGCGCGAGTACGGTGTTCGTGTGCTGGTGATC 

CGGCCCGGCCAGGTGCGTACCCGGATGAGCGCGCACCTCAAGGAAGCTCCATTGACCGTCGA 

CMGGAGTACGTCGCCMCCTCGCGGTGACCGCGTCCGCAAMGGTMGGMTTGGTTTGGGC 

GCCAGCAGCGTTCCGCTACGTCATGATGGTGTTGCGTCACATCCCGCGGAGCATCTTCCGCAA 

GCTGCCCATCTGA 

>Rv3794 embA TB.seq 4243230:424651 1 MW:1 15694 
>emb)ALl23456|MTBH37RV:4243230-4246514, embA SEQ ID NO:143 

GTGCCCCACGACGGTAATGAGCGATCTCACCGGATCGCACGCCTAGCAGCCGTCGTCTCGGGA 

ATCGCGGGTCTGCTGCTGTGCGGCATCGTTCCGCTGCTTCCGGTGAACCAAACCACCGCGACC 

ATCTTCTGGCCGCAGGGCAGCACCGCCGACGGCAACATCACCCAGATCACCGCCCCTCTGGTA 

TCCGGGGCGCCACGCGCGCTGGACATCTCGATCCCCTGCTCGGCCATCGCCACGCTGCCCGC 

CAACGGCGGCCTGGTGCTGTCCACACTGCCGGCCGGTGGCGTGGATACCGGTAAGGCCGGGC 

TGTTCGTCCGCGCCAACCAGGACACGGTCGTCGTGGCGTTCCGCGACTCGGTGGCCGCGGTG 

GCGGCCCGCTCCACGATCGCAGCGGGAGGCTGTAGCGCGCTGCATATCTGGGCCGATACCGG 

CGGCGCGGGCGCTGATTTTATGGGTATACCCGGCGGCGCCGGGACCCTGCCGCCGGAGAAGA 

AGCCACAGGTTGGCGGCATCTTCACCGACCTGAAGGTCGGAGCGCAGCCCGGGCTGTCGGCC 

CGCGTCGACATCGACACTCGGTTTATCACGACGCCCGGCGCGCTCAAGAAGGCCGTGATGCTC 

CTCGGCGTGCTGGCGGTCCTGGTAGCCATGGTGGGGCTGGCCGCGCTGGACCGGCTCAGCAG 

GGGCCGCACCCTGCGCGACTGGCTGACCCGATATCGCCCGCGGGTGCGGGTCGGATTCGCCA 

GCCGGCTCGCTGACGCAGCGGTGATCGCGACCTTGTTGCTCTGGCATGTCATCGGCGCCACCT 

CGTCCGATGACGGCTACCTTCTGACCGTCGCCCGGGTCGCCCCGAAGGCCGGCTATGTAGCCA 

ACTACTACCGGTATTTCGGCACGACGGAGGCGCCGTTCGACTGGTATACATCGGTGCTTGCCCA 

GCTGGCGGCGGTGAGCACCGCCGGCGTCTGGATGCGCCTGCCCGCCACCCTGGCCGGAATCG 

CCTGCTGGCTGATCGTCAGCCGTTTCGTGCTGCGGCGGCTGGGACCGGGCCCGGGCGGGCTG 

GCGTCCAACCGGGTCGCTGTGTTCAGCGCTGGTGCGGTGTTCCTGTCCGCCTGGCTGCCGTTC 

AACAACGGCCTGCGTCCCGAGCCGCTGATCGCGCTGGGTGTGCTGGTCACGTGGGTGTTGGTG 

GAACGGTCGATCGCGCTCGGACGGCTGGCCCCGGCCGCGGTAGCCATCATCGTGGCGACGCT 

TACCGCGACGCTGGCACCGCAGGGGTTGATCGCGCTGGCCCCGCTGCTGACTGGTGCGCGCG 

CCATCGCCCAGAGGATCCGGCGCCGCCGGGCGACCGATGGACTGCTGGCGCCGCTGGCGGT 

GCTGGCCGCGGCGTTGTCGCTGATCACCGTGGTGGTGTTTCGGGACCAGACGCTGGGCACGGT 

GGCCGAATCGGCACGCATCAAGTACAAGGTCGGCCCGACCATCGCCTGGTACCAGGACTTCCT 

GCGCTACTACTTCCTTACCGTGGAGAGCAACGTTGAGGGGTCGATGTCCCGCCGGTTCGCGGT 

GCTGGTGTTGCTGTTCTGCCTGTTCGGGGTGCTGTTCGTGCTGCTGCGGCGCGGCCGGGTGGC 
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GGGGCTGGCCAGC6GCCCGGCCTGGCGACTGATCGGCACTACGGCGGTCGGCCTGCTGCTGC 

TCACGTTCAGGCCAACCAAGTGGGCCGTGCAGTTCGGCGCATTCGCCGGGCTGGCCGGGGTGT 

TGGGTGCGGTCACCGCGTTCACCTTTGCCCGCATCGGTCTACATAGTCGACGCAACCTCACGCT 

GTACGTGACCGCGTTGCTGTTCGTGCTGGCGTGGGCAACCTCGGGCATCAACGGGTGGTTCTA 

CGTCGGCAACTACGGGGTGCCGTGGTATGACATCCAGCCCGTCATCGCCAGCCACCCGGTGAC 

GTCGATGTTTCTGACGCTGTCGATCCTCACCGGATTGCTGGCAGCCTGGTATCACTTCCGGATG 

GACTACGCCGGGCACACCGAAGTCAAAGACAACCGGCGCAACCGCATCTTGGCCTCTACGCCA 

CTGCTGGTGGTCGCGGTGATCATGGTCGCAGGCGAAGTCGGCTCGATGGCCAAGGCCGCGGT 

GTTCCGTTACCCGCTTTACACCACCGCCAAGGCCAACCTGACCGCGCTCAGCACCGGGCTGTC 

CAGCTGTGCGATGGCCGACGACGTGCTGGCCGAGCCCGACCCCAATGCCGGCATGCTGCAAC 

CGGTTCCGGGCCAGGCGTTCGGACCGGACGGACCGCTGGGCGGTATCAGTCCCGTCGGCTTC 

AAACCCGAGGGCGTGGGCGAGGACCTCAAGTCCGACCCGGTGGTCTCCAAACCCGGGCTGGT 

CAACTCCGATGCGTCGCCCAACAAACCCAACGCCGCCATCACCGACTCCGCGGGCACCGCCGG 

AGGGAAGGGCCCGGTCGGGATCAACGGGTCGCACGCGGCGCTGCCGTTCGGATTGGACCCGG 

CACGTACCCCGGTGATGGGCAGCTACGGGGAGAACAAGCTGGCCGCCACGGCCACCTCGGCC 

TGGTACCAGTTACCGCCCCGCAGCCCGGACCGGCCGCTGGTGGTGGTTTCCGCGGCCGGCGC 

CATCTGGTCCTACAAGGAGGACGGCGATTTCATCTACGGCCAGTCCCTGAAACTGCAGTGGGG 

CGTCACCGGCCCGGACGGCCGCATCCAGCCACTGGGGCAGGTATTTCCGATCGACATCGGACC 

GCAACCCGCGTGGCGCAATCTGCGGTTTCCGCTGGCCTGGGCGCCGCCGGAGGCCGACGTGG 

CGCGCATTGTCGCCTATGACCCGAACCTGAGCCCTGAGCAATGGTTCGCCTTCACCCCGCCCC 

GGGTTCCGGTGCTGGAATCTCTGCAGCGGTTGATCGGGTCAGCGACACCGGTGTTGATGGACA 

TCGCGACCGCAGCCAACTTCCCCTGCCAGCGACCGTTTTCCGAGCATCTCGGGATTGCCGAGC 

TTCCGCAGTACCGGATCCTGCCGGACCACAAGCAGACGGCGGCGTCGTCGAACCTATGGCAGT 

CCAGCTCGACCGGCGGTCCGTTCCTGTTCACCCAGGCGCTGCTGCGCACCTCGAGGATCGCCA 

CGTACCTGCGTGGGGACTGGTATCGCGACTGGGGATCGGTGGAGCAGTACCACCGGCTGGTG 

CCGGCCGATCAGGCTCCAGACGCCGTTGTCGAGGAGGGCGTGATCACTGTGCCCGGCTGGGG 

TCGGCCAGGACCGATCAGGGCGCTGCCATGA 

>Rv3795 embBTB.seq 424651 1:4249804 MWH18023 
>emb|AL123456|MTBH37RV:424651 1-4249807, embB SEQ ID NO:144 

ATGACACAGTGCGCGAGCAGACGCAAMGCACCCCAAATCGGGCGATTTTGGGGGCTTTTGCG 
TCTGCTCGCGGGACGCGCTGGGTGGCCACCATCGCCGGGCTGATTGGCTTTGTGTTGTCGGTG 
GCGACGCCGCTGCTGCCCGTCGTGCAGACCACCGCGATGCTCGACTGGCCACAGCGGGGGCA 
ACTGGGCAGCGTGACCGCCCCGCTGATCTCGCTGACGCCGGTCGACTTTACCGCCACCGTGCC 
GTGCGACGTGGTGCGCGCCATGCCACCCGCGGGCGGGGTGGTGCTGGGCACCGCACCCAAG 
CAAGGCAAGGACGCCAATTTGCAGGCGTTGTTCGTCGTCGTCAGCGCCCAGCGCGTGGACGTC 
ACCGACCGCAACGTGGTGATCTTGTCCGTGCCGCGCGAGCAGGTGACGTCCCCGCAGTGTCAA 
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CGCATCGAGGTCACCTCTACCCACGCC6GCACCTTCGCCAACTTCGTCGGGCTCAAGGACCCG 
TCGGGCGCGCCGCTGCGCAGCGGCTTCCCCGACCCCAACCTGCGCCCGCAGATTGTCGGGGT 
GTTCACCGACCTGACCGGGCCCGCGCCGCCCGGGCTGGCGGTCTCGGCGACCATCGACACCC 
GGTTCTCCACCCGGCCGACCACGCTGAAACTGCTGGCGATCATCGGGGCGATCGTGGCCACCG 
5 TCGTCGCACTGATCGCGTTGTGGCGCCTGGACCAGTTGGACGGGCGGGGCTCAATTGCCCAGC 
TCCTCCTCAGGCCGTTCCGGCCTGCATCGTCGCCGGGCGGCATGCGCCGGCTGATTCCGGCAA 
GCTGGCGCACCTTCACCCTGACCGACGCCGTGGTGATATTCGGCTTCCTGCTCTGGCATGTCAT 
CGGCGCGAATTCGTCGGACGACGGCTACATCCTGGGCATGGCCCGAGTCGCCGACCACGGCG 
GCTACATGTCCAACTATTTCCGCTGGTTCGGCAGCCCGGAGGATCCCTTCGGCTGGTATTAGAA 

10 CCTGCTGGCGCTGATGACCCATGTCAGCGACGCCAGTCTGTGGATGCGCCTGCCAGACCTGGC 
CGCCGGGCTAGTGTGCTGGCTGCTGCTGTCGCGTGAGGTGCTGCCCCGCCTCGGGCCGGCGG 
TGGAGGCCAGCAAACCCGCCTACTGGGCGGCGGCCATGGTCTTGCTGACCGCGTGGATGCCG 
TTCAACAACGGCCTGCGGCCGGAGGGCATCATCGCGCTCGGCTCGCTGGTCACCTATGTGCTG 
ATCGAGCGGTCCATGCGGTACAGCCGGCTCACACCGGCGGCGCTGGCCGTCGTTACCGCCGC 

16 ATTCACACTGGGTGTGCAGCCCACCGGCCTGATCGCGGTGGCCGCGCTGGTGGCCGGCGGCC 
GCCCGATGCTGCGGATCTTGGTGCGCCGTCATCGCCTGGTCGGCACGTTGCCGTTGGTGTCGC 
CGATGCTGGCCGCCGGCACCGTCATCCTGACCGTGGTGTTCGCCGACCAGACCCTGTCAACGG 
TGTTGGAAGCCAGCAGGGTTCGCGCCAAAATCGGGCCGAGCCAGGCGTGGTATACCGAGAACC 
TGCGTTACTACTACCTCATCCTGCCCACCGTCGACGGTTCGCTGTCGCGGCGCTTCGGCI I I T T 

20 GATCACCGCGCTATGCCTGTTCACCGCGGTGTTCATCATGTTGCGGCGCAAGCGAATTCCCAGC 
GTGGCCCGCGGACCGGCGTGGCGGCTGATGGGCGTCATCTTCGGCACCATGTTCTTCCTGATG 
TTCACGCCCACCAAGTGGGTGCACCACTTCGGGCTGTTCGCCGCCGTAGGGGCGGCGATGGC 
CGCGCTGACGACGGTGTTGGTATCCCCATCGGTGCTGCGCTGGTCGCGCAACCGGATGGCGTT 
CCTGGCGGCGTTATTCTTCCTGCTGGCGTTGTGTTGGGCCACCACCAACGGCTGGTGGTATGTC 

25 TCCAGCTACGGTGTGCCGTTCAACAGCGCGATGCCGAAGATCGACGGGATCACAGTCAGCACA 
ATCTTTTTCGCCCTGTTTGCGATCGCCGCCGGCTATGCGGCCTGGCTGCACTTCGCGCGGGGC 
GGCGCCGGCGAAGGGCGGCTGATCCGCGCGCTGACGACAGCCCCGGTACCGATCGTGGCCG 
GTTTCATGGCGGCGGTGTTCGTCGCGTCCATGGTGGCCGGGATCGTGCGACAGTACCCGACCT 
ACTCCAACGGCTGGTGCAACGTGCGGGCGTTTGTCGGCGGCTGCGGACTGGCCGACGACGTA 

30 CTCGTCGAGCCTGATACCMTGCGGGTTTCATGAAGCCGCTGGACGGCGATTCGGGTTCTTGG 
GGCCCCTTGGGCCCGCTGGGTGGAGTCAACCCGGTCGGCTTCACGCCCAACGGCGTACCGGA 
ACACACGGTGGCCGAGGCGATCGTGATGAAACCCAACCAGCCCGGCAGCGACTACGACTGGGA 
TGCGCCGACCAAGCTGACGAGTCCTGGCATCAATGGTTCTAGGGTGCCGCTGCCGTATGGGGT 
CGATCCCGCCCGGGTACCGTTGGCAGGCACCTACACCACCGGCGCACAGCAACAGAGCACACT 

35 CGTCTCGGCGTGGTATCTCCTGCCTAAGCCGGACGACGGGCATCCGCTGGTCGTGGTGACCGC 
CGCGGGCAAGATCGCCGGCAACAGCGTGCTGCACGGGTACACCCCCGGGCAGACTGTGGTGC 
TCGAATACGCCATGCCGGGACCCGGAGCGCTGGTACCCGCCGGGCGGATGGTGCCCGACGAC 
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CTATACGGAGAGCAGCCCAAGGCGTGGCGCAACCTGCGCTTCGCCCGAGCAAAGATGCCCGC 

CGATGCCGTCGCGGTCCGGGTGGTGGCCGAGGATCTGTCGCTGACACCGGAGGACTGGATCG 

CGGTGACCCCGCCGCGGGTACCGGACCTGCGCTCACTGCAGGAATATGTGGGCTCGACGCAG 

CCGGTGCTGCTGGACTGGGCGGTCGGTTTGGCCTTCCCGTGCCAGCAGCCGATGCTGCACGC 

CAATGGCATCGCCGAAATCCCGAAGTTCCGCATGACACCGGACTACTCGGCTAAGAAGCTGGAC 

ACCGACACGTGGGAAGACGGCACTAACGGCGGCCTGCTCGGGATCACCGACCTGTTGCTGCG 

GGCCCACGTCATGGCCACCTACCTGTCCCGCGACTGGGCCCGCGATTGGGGTTCCCTGCGCAA 

GTTCGACACCCTGGTCGATGCCCCTCCCGCCCAGCTCGAGTTGGGCACCGCGACCCGCAGCG 

GCCTGTGGTCACCGGGCAAGATCCGAATTGGTCCATAG 

>Rv3834c serS seryMRNA synthase TB.seq 4307655:430891 1 MW:45293 
>emb|AL123456|MTBH37RV;o4308911-4307652 f serS SEQ ID NO:145 

GTGATCGACCTGAAGCTGCTTCGTGAAAACCCCGACGCGGTACGCCGCTCACAACTCAGCCGC 

GGCGAGGACCCGGCGCTGGTAGATGCCCTGCTGACGGCCGAGGCCGCCCGCCGGGCCGTGA 

TCTCGACCGCCGATTCGTTACGGGCCGAGCAGAAAGCCGCCAGCAAAAGCGTGGGTGGCGCG 

TCTCCCGAAGAGCGCCCGCCGCTGCTGCGGCGCGCGAAGGAACTCGCCGAGCAGGTCAAAGC 

CGCTGAGGCCGACGAGGTCGAAGCGGAGGCGGCGTTCACCGCGGCGCACCTGGCGATCTCGA 

ATGTCATCGTGGACGGGGTACCCGCCGGCGGGGAGGACGACTACGCGGTGCTCGACGTCGTC 

GGCGAGCCCAGCTACCTCGAGAACCCCAAGGACCACCTGGAGCTCGGCGAGTCGCTGGGCCT 

GATCGACATGCAGCGCGGCGCCAAGGTGTCGGGTTCACGGTTCTACTTCCTGACCGGTCGGGG 

TGCCCTACTGCAGCTTGGATTGCTGCAGCTGGCGCTGAAGCTAGCCGTCGACAACGGCTTTGTC 

CCTACGATCCCGCCGGTGCTGGTGCGCCCGGAAGTGATGGTAGGCACGGGATTTCTAGGCGCC 

CACGCCGAGGAGGTGTACCGGGTAGAGGGCGACGGCCTCTACCTTGTGGGCACCTCCGAGGT 

ACCGCTGGCGGGGTATCACTCCGGCGAGATTCTGGACCTTTCCCGCGGGCGGCTGCGGTATGC 

G6GCTGGTCGTCGTGTTTCCGACGTGAGGCCGGCAGCCATGGCAAGGACACGCGCGGCATCA 

TCCGGGTGCACCAGTTCGACAAAGTCGAGGGCTTCGTCTACTGCACACCGGCCGACGCGGAGC 

ACGAACATGAGCGGCTGCTGGGCTGGCAGCGCCAGATGCTGGCACGCATCGAGGTGCCGTAT 

CGGGTCATCGACGTGGCCGCGGGTGATCTCGGCTCGTCGGCCGCCCGCAAGTTCGACTGCGA 

GGCGTGGATTCCGACGCAGGGGGCCTATCGCGAGCTGACGTCGACGTCGAACTGCACCACCTT 

TCAGGCGCGCCGGTTGGCGACCCGCTACCGGGATGCCAGCGGCAAGCCGCAGATCGCGGCCA 

CCCTCAACGGAACGCTGGCCACCACCCGGTGGCTGGTTGCGATCCTGGAGAACCACCAGCGG 

CCCGACGGCAGCGTTAGAGTCCCGGACGCACTGGTTCCGTTCGTGGGTGTCGAAGTGCTGGAG 
CCGGTCGCTTAG 

>Rv3907c pcnA polynucleotide polymerase TB.seq 4391 631 :4393070 MW:53057 
>emb|AL123456|MTBH37RV:o439307O4391628 l pcnA SEQ !D NO:146 
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GT6CCGGAAGCCGTCCAGGAAGCCGATCTGCTAACCGCCGCTGCGGTTGCCTTGAACAGGCAT 
GCTGCCTTATTGCGGGAACTCGGGTCGGTGTTCGCCGCCGCGGGACACGAGTTGTATCTGGTC 
GGCGGTTCGGTGCGAGATGCACTGTTGGGCCGGTTGAGCCCCGACCTGGACTTCACCACCGAC 
GCCCGTCCCGAGCGGGTGCAGGAGATCGTGCGGCCGTGGGCCGATGCGGTGTGGGATACCG 
5 GAATCGAATTCGGCACCGTCGGCGTGGGTAAGAGCGACCACCGCATGGAGATCACCACATTCC 
GTGCCGACAGCTACGACCGGGTTTCGCGTCATCCAGAGGTACGTTTCGGCGATTGCCTCGAGG 
GCGATCTGGTCCGCCGCGACTTCACCACGAACGCAATGGCTGTGCGCGTCACCGCCACTGGGC 
CGGGCGAATTCCTGGATCCGCTTGGTGGCTTGGCGGCGCTGCGGGCCAAGGTGTTAGACACCC 
CGGCGGCGCCGTCGGGGTCCTTTGGCGACGATCCGTTGCGGATGCTGCGCGCCGCGCGGTTC 

10 GTCTCGCAACTTGGATTCGCGGTGGCGCCGCGGGTGCGCGCGGCGATCGAAGAGATGGCGCC 
GCAGTTGGCCCGAATCAGCGCCGAACGGGTGGCCGCCGAGCTGGACAAGCTGCTGGTCGGTG 
AGGATCCGGCCGCGGGTATCGACCTGATGGTGCAGAGCGGTATGGGTGCTGTGGTCTTGCCTG 
AAATCGGTGGGATGCGGATGGCGATCGACGAACATCACCAGCACAAGGACGTCTATCAGCATTC 
CTTGACCGTGCTGCGGCAGGCGATCGCGCTGGAGGACGACGGCCCGGATCTGGTGTTGCGCT 

15 GGGCGGCGCTGCTGCACGACATCGGCAAGCCCGCCACCCGCCGTCACGAACCCGACGGTGGG 
GTGAGCTTCCATCACCACGAAGTGGTCGGCGCCAAGATGGTGCGCAAGCGGATGCGGGCGCT 
GAAGTATTCCAAGCAGATGATCGACGACATCTCGCAGCTGGTCTACCTGCATCTGCGGTTTCAC 
GGCTACGGCGATGGGAAATGGACCGACTCTGCGGTGCGCCGCTATGTCACCGACGCCGGGGC 
CCTACTGCCACGGCTGCACAAGCTGGTGCGCGCCGACTGCACGACCCGCAACAAGCGCCGGG 

20 CCGCGCGGTTGCAGGCCAGTTACGACCGGCTGGAAGAGCGGATCGCGGAGCTGGCCGCCCAG 
GAGGATCTGGATCGGGTGCGCCCCGACCTGGACGGCAACCAGATCATGGCGGTGCTCGACATT 
CCGGCGGGCCCGCAAGTCGGCGAGGCGTGGCGCTACTTGAAGGAGCTGCGGCTAGAGCGCG 
GCCCGTTGTCCACCGAGGAGGCGACAACCGAGCTGCTGTCCTGGTGGAAATCACGGGGGAAC 
CGCTAG 

25 

TABLE 4 

>Rv0002 dnaN DNA polymerase III, b-subunit TB.seq 2052:3257 MW:42114 SEQ ID NO:147 
MDAATTRVGLTDLTFRLLRESFADAVSVVVAKNLPARPAVPVLSGVLLTGSDNGLTISGFDYEVSAEA 
QVGAEIVSPGSVLVSGRLLSDITRALPNKPVDVHVEGNRVALTCGNARFSLPTMPVEDYPTLPTLPEE 
30 TGLLPAELFAEAiSQVAIAAGRDDTLPMLTGIRVEILGETWU\ATDRFRLAVRELKWSASSPDIEAAV 
WAKTLAEAAKAGIGGSDVRLSLGTGPGVGKDGLLGISGNGKRSTTRLLDAEFPKFRQLLPTEHTAVA 
TMDVAELIEAIKLVALVADRGAQVRMEFADGSVRLSAGADDVGRAEEDLWDYAGEPLTIAFNPTYLT 
DGLSSLRSERVSFGFTTAGKPALLRPVSGDDRPVAGLNGNGPFPAVSTDYVYLLMPVRLPG 

35 >Rv0003 recF DNA replication and SOS induction TB.seq 3280:4434 MW:42181 SEQ ID NO: 148 
VWRHLGLRDFRSWACVDLELHPGRWF^GPNGYGKTNLIEALWYSTTLGSHRV 
AVISTIWNDGRECAVDLEIATGRVNKARLNRSSVRSTRDWGVLRAVLFAPEDLGLVRGDPADRRR 
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YLDDLAIVRRPAIAAVRAEYERVLRQRTALLKSVPGARYRODRGVFDTLEVWDSRLAEHGAELVAAR! 
DLVNQLAPEVKKAYQLLAPESRSASIGYRASMDVTGPSEQSDIDRQLLAARLLAALAARRDAELERG 
VCLVGPHRDDULRLGDQPAKGFASHGEAWSLAVALRLAAYQLLRVDGGEPVLLLDDVFAELDVMRR 
RALATAAESAEQVLVTAAVLEDIPAGWDARRVHIDVRADDTGSMSWLP 

5 

>Rv0005 gyrB DNA gyrase subunit B TB.seq 5123:7264 MW:78441 SEQ ID NO:149 
MGKNEARRSALAPDHGTWCDPLRRLNRMHATPEESIRIVAAQKKKAQDEYGAASITILEGLEAVRKR 
PGMYIGSTGERGLHHLIWEWDNAVDEAMAGYAm/NWLLEDGGVEVADDGRGIPVATHASGIPTV 
DWMTQLHAGGKFDSDAYAISGGLHGVGVSWNALSTRLEVEIKRDGYEWSQVYEKSEPLGLKQGA 

10 PTKKTGSTVRFWADPAVFETTEYDFETVARRLQEMAFLNKGLTINLTDERVTQDEWDEWSDVAEA 
PKSASERAAESTAPHKVKSRTFHYPGGLVDFVKHINRTKNAIHSSIVDFSGKGTGHEVEIAMQWNAG 
YSESVHTFANTINTHEGGTHEEGFRSALTSWNKYAKDRKLLKDKDPNLTGDDIREGLAAVISVKVSE 
PQFEGQTKTKLGNTEVKSFVQKVCNEQLTHWFEANPTDAKWVNKAVSSAQARIAARKARELVRRK 
SATDIGGLPGKI^CRSTDPRKSELYWEGDSAGGSAKSGRDSMFQAILPLRGKIINVEKARIDRVLK 

15 I^EVQAIITALGTGIHDEFDIGKLRYHKIVL^DADVDGQHISTLLLTLLFRFMRPUENGHVFLAQPPUY 
KLKWQRSDPEFAYSDRERDGLLEAGLKAGKKINKEDGIQRYKGLGEMDAKELWETTMDPSVRVLRQ 
VTLDDAAAAOELFSILMGEDVDARRSFITRNAKDVRFLDV 

>Rv0006 gyrA DNA gyrase subunit A TB.seq 7302:9815 MW:92276 SEQIDNO:150 
20 MTDTTLPPDDSLDRIEPVDIEQEMQRSYIDYAMSVIVGRALPEVRDGLKPVHRRVLYAMFDSGFRPD 
RSHAKSARSVAETMGNYHPHGDASIYDSLVRMAQPWSLRYPLVDGQGNFGSPGNDPPAAMRYTEA 
RLTPUVMEMLREIDEEWDFIPNYDGRVQEPTVLPSRFPNLLANGSGGIAVGMATNIPPHNLRELADA 
VFWALENHDADEEETLAAVMGRVKGPDFPTAGLIVGSQGTADAYKTGRGSIRMRGWEVEEDSRG 
RTSLVITELPYQVNHDNFITSIAEQVRDGKLAGISNIEDQSSDRVGLRMEIKRDAVAKWINNLYKHTQ 
25 LQTSFGANMI^VDGVPRTLRLDCXIRYWDHQLDVIVRRTTYRLRKANERAHILRGLVKALDALDEVI 
ALIRASETVDIARAGLIELLDIDEIQAQAILDMQLRRLAALERQRIIDDLAKIEAEIADLEDILAKPERQRGI 
VRDELAEIVDRHGDDRRTRIIAADGDVSDEDLIAREDNAA/TITETGYAKRTKTDLYRSQKRGGKGVQG 
AGLKQDDIVAHFFVCSTHDLILFFTTQGRVYRAKAYDLPEASRTARGQHVANLLAFQPEERIAQVIQiR 
GYTDAPYLVLATRNGLVKKSKLTDFDSNRSGGIVAVNLRDNDELVGAVLCSAGDDLLLVSANGQSIR 
30 FSATDEALRPMGRATSGVQGMRFNIDDRLLSLNWREGTYLLVATSGGYAKRTAIEEYPVQGRGGK 
GVLWMYDRRRGRLVGALIVDDDSELYAVTS6GGVIRTAARQVRKAGRQTKGVRLMNLGEGDTLLAI 
ARNAEESGDDNAVDANGADQTGN 

>Rv0014c pknB serine-threonine protein kinase TB.seq 15593:17470 MW:66511 SEQ ID NO:151 
35 MTTPSHLSDRYELGEILGFGGMSEVHLARDLRLHRDVAVKVLRADLARDPSFYLRFRREAQNAAALN 
HPAIVAWDTGEAETPAGPLPYIVMEYVDGVTLRDIVHTEGPMTPKRAIEV1ADACQALNFSHQNGIIH 
RDVKPANIMISATNAVKVMDFGIARAIADSGNSVTQTAAVIGTAQYLSPEQARGDSVDARSDVYSLGC 
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VLYEVLTGEPPFTGDSPVSVAYQHVREDPIPPSARHEGLSADLDAWLKALAKNPENRYQTAAEMRA 
DLVRVHNGEPPEAPKVLTDAERTSLLSSAAGNLSGPRTDPLPRQDLDDTDRDRSIGSVGRWVAWA 
VLAVLTVVVTIAINTFGGITRDVQVPDVRGQSSADAIATLQNRGFKIRTLQKPDSTIPPDHVIGTDPAAN 
TSVSAGDEITVNVSTGPEQREIPDVSTLTYAEAVKKLTAAGFGRFKQANSPSTPELVGKVIGTNPPAN 
5 QTSAITNWIIIVGSGPATKDIPDVAGQTVDVAQKNLNVYGFTKFSQASVDSPRPAGEVTGTNPPAGT 
WPVDSVIELQVSKGNQFVMPDLSGMFVWDAEPRLRALGWTGMLDKGADVDAGGSQHNRWYQN 
PPAGTGVNRDGI ITLRFGQ 

>Rv0016c pbpA TB.seq 18762:20234 MW:51577 SEQ ID NO:152 

10 MNASLRRISVTVMALIVLLLLNATMTQVFTADGLRADPRNQRVLLDEYSRQRGQITAGGQLLAYSyAT 
DGRFRFLRVYPNPEVYAPVTGFYSLRYSSTALERAEDPILNGSDRRLFGRRLADFFTGRDPRGGNV 
DTTINPRI(X^GWDAMQQGCYGF^GAWALEPSTGKILALVSSPSYDPNLLASHNPEVQAQAWQR 
LGDNPASPLTNRAISETYPPGSTFKVITTAAALAAGATETEQLTAAPTIPLPGSTAQLENYGGAPCGDE 
PTVSLREAFVKSCNTAFVQLGIRTGADALRSMARAFGLDSPPRPTPLQVAESTVGPIPDSAALGMTSI 

15 GQKDVALTPU\NAEIAATIANGGITMRPYLVGSLKGPDLANISTTVGYQQRRAVSPQVAAKLTELMVG 
AEKVAQQKGAIPGVQIASKTGTAEHGTDPRHTPPHAWYIAFAPAQAPKVAVAVLVENGADRLSATGG 
ALAAPIGRAVIEAALQGEP 

>Rv0017c rodA TB.seq 20234:21640 MW:50612 SEQ ID NO:153 

20 MTTRLQAPVAVTPPLPTRRNAELLLLCFAAVITFAALLWQANQDQGVPWDLTSYGLAFLTLFGSAHL 
AIRRFAPYTDPLLLPWALLNGLGLVMIHRLDLVDNEIGEHRHPSANQQMLWTLVGVAAFALWTFLK 
DHRQLARYGYICGLAGLVFLAVPALLPAALSEQNGAKIWIRLPGFSIQPAEFSKILLUFFSAVLVAKRG 
LFTSAGKHLLGMTLPRPRDLAPLLAAWVISVGVMVFEKDLGASLLLYTSFLWVYI^TQRFSVVWIGL 
TLFAAGTLVAYFIFEHVRLRVQTWLDPFADPDGTGYQIVQSLFSFATGGIFGTGLGNGQPDTVPAAST 

25 DFIIAAFGEELGLVGLTAILMLYTMIRGLRTAIATRDSFGKLLAAGLSSTI-AIQLFIWGGVTRLIPLTGLT 
TPWMSYGGSSLLANYILLAILARISHGARRPLRTRPRNKSPITAAGTEVIERV 

>Rv0018c ppp TB.seq 21640:23181 MW:53781 SEQ ID NO:154 

VARmWRYAARSDRGLVRANNEDSWAGARLLALADGMGGHAAGEVASQLVIAALAHLDDDEPG 
30 GDLLAKLDAAVRAGNSAIAAQVEMEPDLEGMGTTLTAILFAGNRLGLVHIGDSRGYLLRDGELTQITK 
DDTFVQTLVDEGRITPEEAHSHPQRSLIMRALTGHEVEPTLTMREARAGDRYLLCSDGLSDPVSDETI 
LEALQIPEVAESAHRLIEUM.RGGGPDNVTVWADWDYDYGQTQPILAGAVSGDDDQLTLPNTAAG 
PJ\SAISQRKEIVKRVPPQADTFSRPRWSGRRLAFWALVTVLMTAGLLIGRAIIRSNYYVADYAGSVSI 
MRGIQGSLLGMSLHQPYLMGCLSPRNELSQISYGQSGGPLDCHLMKLEDLRPPERAQVRAGLPAGT 
35 LDDAIGQLRELAANSLLPPCPAPRATSPPGRPAPPTTSETTEPNVTSSPASPSPTTSAPAPTGTTPAIP 
TSASPAAPASPPTPWPVTSSPTMAALPPPPPQPGIDCRAAA 
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>Rv0019c - TB.seq 23273:23737 MW:17153 SEQ ID NO:155 
MQGLVLQLTF^GFLMLLWVFIWSVLRILKTDIYAPTGAVMMRRGLALRGTLLGARQR 
EGALTGARITLSEQP\^IGRADDSTLVLTDDYASTRHARLSMRGSEVVWEDLGSTlslGTY 
AVRVPIGTPVRIGKTAIELRP 

5 

>Rv0020c - TB.seq 23864:25444 MW:56881 SEQ ID NO:156 

MGSQKRLVQRVERKLEQTVGDAFARIFGGSIVPQEVEALLRREAADGIQSLQGNRLLAPNEYIITLGV 
HDFEKLGADPELKSTGFARDLADYIQEQGWQTYGDWVRFEQSSNLHTGQFRARGTVNPDVETHP 
PVIDCARPQSNHAFGAEPGVAPMSDNSSYRGGQGQGRPDEYYDDRYARPQEDPRGGPDPQGGS 

10 DPRGGYPPETGGYPPQPGYPRPRHPDQGDYPEQIGYPDQGGYPEQRGYPEQRGYPDQRGYQDQ 
GRGYPDQGQGGYPPPYEQRPPVSPGPAAGYGAPGYDQGYRQSGGYGPSPGGGQPGYGGYGEY 
GRGPARHEEGSYVPSGPPGPPEQRPAYPDQGGYDQGYQQGATTYGRQDYGGGADYTRYTESPR 
VPGYAPQGGGYAEPAGRDYDYGQSGAPDYGQPAPGGYSGYGQGGYGSAGTSVTLQLDDGSGRT 
YQLREGSNIIGRGQDAQFRLPDTGVSRRHLEIRWDGQVALLADLNSTNGTTVNNAPVQEWQLADGD 

15 VIRLGHSEIIVRMH 

>Rv0032 bk>F2 C-terminal similar to B. subtilis BioF TB.seq 34295:36607 MW:86245 
SEQ ID NO:157 

MPTGLGYDFLRPVEDSGINDLKHWFMADI^DGQPLGRANLYSVCFDLATTDRKLTPAWRTTIKRWF 
20 PGFMTFRFLECGLLTMVSNPLALRSDTDLERVLPVLAGQMDQLAHDDGSDFLMIRDVDPEHYQRYL 
DILRPLGFRPALGFSRVDTTISWSSVEEALGCLSHKRRLPLKTSLEFRERFGIEVEELDEYAEHAPVLA 
RLWRNVKTEAKDYQREDLNPEFFMCSRHLHGRSRLWLFRYQGTPIAFFLNVWGADENYILLEWGI 
DRDFEHYRKANLYRAALMLSLKDAISRDKRRMEMGITNYFTKLRIPGARVIPTIYFLRHSTDPVHTATL 
ARMMMHNIQRPTLPDDMSEEFCRWEERIFU.DQDGLPEHDIFRKIDRQHKYTGLKLGGWG^ 
25 GPQRSTVKAAELGEIVLLGTNSYLGLATHPEWEASAEATRRYGTGCSGSPLLNGTLDLHVSLEQEL 
ACFLGKPMVLCSTGYQSNIMISALCESGDMIIQDALNHRSLFDAARLSGADFTLYRHNDMDH^ 
LRRTEGRRRIIWDAWSMEGWADUVTIAELADRHGCRVWDESHALGVLGPDGRGASAALGVU\R 
MDWMGTFSKSFASVGGFIAGDRPWDYIRHNGSGHVFSASLPPAAAAATHAALRVSRREPDRRAR 
VLAAAEYMATGIARQGYQAEYHGTAIVPV^ 
30 TSYLADHRQSDLDRALHVFAGLAEDLTPQGAAL 

>Rv0050 ponAI TB.seq 53661:55694 MW:71 119 SEQ ID NO: 158 

WILLPMWFTMAYLIVDVPKPGDIRTNQVSTILASDGSEIAKIVPPEGNRVDVNLSQVPMHVRQAVIAA 
EDRNFYSNPGFSFTGFARAVKNNLFGGDLQGGSTITQQYVKNALVGSAQHGWSGLMRKAKELVIAT 
35 KMSGEWSKDDVLQAYLNIIYFGRGAYGISAASKAYFDKPVEQLTVAEGALLAALIRRPSTLDPAVDPE 
GAHARWNVVVLDGMVETKALSPNDRMQVFPEWPPDLARAENQTKGPNGLIERQVTRELLELFNID 
EQTLNTQGLWmiDPQAQRAAEKAVAKYLDGQDPDMRAAWSIDPHNGAVRAYYG 

156 



WO 01/35317 



PCT/USOO/31152 



AQAGLQTGSSFKVFALVAALEQGIGLGYQVDSSPLTVDGIKITNVEGEGCGTCNIAEALKMSLNTSYY 

RLMLKLNGGPQAVADAAHQAGIASSFPGVAHTLSEDGKGGPPNNGIVLGQYQTRVIDMASAYATLAA 

SGIYHPPHFVQKWSANGQVLFDASTADNTGDQRIPKAVADNVTAAMEPIAGYSRGHNLAGGRDSA 

AKTGTTQFGDTTANKDAWMVGYTPSLSTAVWVGTVKGDEPLVTASGAAIYGSGLPSDIWKATMDGA 

LKGTSNETFPKPTEVGGYAGVPPPPPPPEVPPSEWIQPTVEIAPGITIPIGPPTTITLAPPPPAPPAAT 

PTPPP 

>RvO051 -TB.seq 55694:57373 MW:61210 SEQlDNO:159 

VTGALSQSSNISPLPLAADLRSADNRDCPSRTDVLGAALANWGGPVGRHALIGRTRLMTPLRVMFAI 

ALVFLALGWSTKAACLQSTGTGPGDQRVANWDNQRAYYQLCYSDTVPLYGAELLSQGKFPYKSSWI 

ETDSNGTPQLRYDGQIAVRYMEYPVLTGIYQYLSMAIAKTYTALSKVAPLPWAEWMFFNVAAFGLA 

UWLTTVWATSGLAGRRIWDAALVAASPLVIFQIFTNFDALATGLATSGLLAWARRRPVLAGVLIGLG 

SMKLYPLLFLYPLLLLGIRAGRLNALARTMAAAAATWLLVNLPVMLLFPRGWSEFFRLNTRRGDDM 

DSLYNWKSFTGWRGFDPTLGFWEPPLVLN7WTLLFVLCCAAIAYIALTAPHRPRVAQLTFLTVASFL 

LVNKVWSPQFSLWLVPL^VUU-PHRRILLAVW^ 

IAVMVLCGLWWQIYRPGRDLVRTGGPGALPACGGVDDPVGGVFANAADAPPGRLPSWLRPRLGD 
EHARERTPDAGRDRTFSGQHRA 

>Rv0106 - TB.seq 124372:125565 MW:43701 SEQ ID NO:160 

MRTPVILVAGQDHTDEWGALLRRTGTNAA/EHRFDGHWRRMTATLSRGELITTEDALEFAHGCVSC 

TIRDDLLVLLRRLHRRDNVGRIWHUVPWLEPQPICWAIDHVRVCVGHGYPDGPAALDVRVAAWTC 

VDCVRWLPQSLGEDELPDGRTVAQVTVGQAEFADLLVLTHPEPVAVAVLRRLAPRARITGGVDRVEL 

ALAHLDDNSRRGRTDTPHTPLLAGLPPLAADGEVAIVEFSARRPFHPQRLHAAVDLLLDGWRTRGR 

LWI^NRPDQVMWLESAGGGLRVASAGKWLAAMAASEVAYVDLERRLFADLMWVYPFGDRHTAMT 

VLVCGADPTDIVNALNAALLSDDEMASPQRWQSYVDPFGDWHDDPCHEMPDAAGEFSAHRNSGES 

R 

>Rv0125 ■ TB.seq 151146:152210 MW:34927 SEQ ID NO:161 

MSNSRRRSLRWSWLLSVLAAVGLGLATAPAQAAPPALSQDRFADFPALPLDPSAMVAQVGPQWNI 
NTKLGYNNAVGAGTGIVIDPNGWLTNNHVIAGATDINAFSVGSGQTYGVDWGYDRTQDVAVLQLR 
GAGGLPSAAIGGGVAVGEPWAMGNSGGQGGTPRAVPGRWALGQTVQASDSLTGAEETLNGLIQ 
FDAAIQPGDSGGPWNGLGQWGMNTAASDNFQLSQGGQGFAIPIGQAMAIAGQIRSGGGSPTVHI 
GPTAFLGLGWDNNGNGARVQRWGSAPAASLGISTGDVITAVDGAPINSATAMADALNGHHPGDVI 
SVTWQTKSGGTRTGNVTLAEGPPA 

>Rv0350 dnaK 70 kD heat shock protein, chromosome replication TB.seq 419833:421707 
MW:66832 SEQIDNO:162 
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MARAVGIDLGTTNSWSVLEGGDPWVANSEGSRTTPSIVAFARNGEVLVGQPAKNQAVTNVDR7V 

RSVKRHMGSDWSIEIDGKKYTAPEISARILMKLKRDAEAYLGEDITDAVITTPAYFNDAQRQATKDAG 

QIAGLNVLRIVNEPTAAALAYGLDKGEKEQRILVFDLGGGTFDVSLLEIGEGWEVRATSGDNHLGGD 

DWDQRWDWLVDKFKGTSGIDLTKDKMAMQRLREAAEKAKIELSSSQSTSINLPYITVDADKNPLFLD 

EQLTRAEFQRITQDLLDRTRKPFQSVIADTGISVSEIDHWLVGGSTRMPAVTDLVKELTGGKEPNKG 

VNPDEWAVGMLQAGVLKGEWDVLLLDWPLSLGIETKGGVMTRLIERNTTIPTKRSETFTTADDN 

QPSVQIQVYQGEREIAAHNKLLGSFELTGIPPAPRGIPQIEVTFDIDANGIVHVTAKDKGTGKENTIRIQ 

EGSGLSKEDIDRMIKDAEAHAEEDRKRREEADVRNQAETLVYQTEKFVKEQREAEGGSKVPEDTLN 

KVDAAVAEAKAALGGSDISAIKSAMEKLGQESQALGQAIYEAAQAASQATGAAHPGGEPGGAHPGS 
ADDWDAEWDDGREAK 

>Rv0351 grpE stimulates DnaKATPase activity TB.seq 421707:422411 MW:24501 
SEQID NO:163 

vTDGNQKPDGNSGEQVTVTDKRRIDPETGEVRHVPPGDMPGGTAAADAAHTEDKVAELTADLQRV 
QADFANYRKRALRDQQAAADRAKASWSQLLGVLDDLERARKHGDLESGPLKSVADKLDSALTGLG 
LVAFGAEGEDFDPVLHEAVQHEGCX3GQGSKPVIGTVMRQGYQLGEQVLRHALVGWDTWVDAAE 
LESVDDGTAVADTAENDQADQGNSADTSGEQAESEPSGS 

>Rv0352 dnaJ acts with GrpE to Stimulate DnaK ATPase TB.seq 422450:423634 MW:41 346 
SEQIDNO:164 

MAQREVWEKDFYQELGVSSDASPEEIKRAYRKLARDLHPDANPGNPAAGERFKAVSEAHNVLSDPA 

KRKEYDETRRLFAGGGFGGRRFDSGFGGGFGGFGVGGDGAEFNLNDLFDAASRTGGTTIGDLFGG 

LFGRGGSARPSRPRRGNDLETETELDFVEAAKGVAMPLRLTSPAPCTNCHGSGARPGTSPKVCPTC 

NGSGVINRNQGAFGFSEPCTDCRGSGSIIEHPCEECKGTGVTTRTRTINVRIPPGVEDGQRIRLAGQ 

GEAGLRGAPSGDLY\nVrWRPDKIFGRDGDDLTVlVPVSFTELALGSTLSWTLDGTVGVRVPKGTA 

DGRILRVRGRGVPKRSGGSGDLLVTVKVAVPPNLAGAAQEALEAYAAAERSSGFNPRAGWAGNR 

>Rv0363cfba fructose btephosphate aldolase TB.seq 441266:442297 MW:36545 

SEQ ID NO:165 

MPIATPEWAEMLGQAKQNSYAFPAINCTSSETVNAAIKGFADAGSDGIIQFSTGGAEFGSGLGVKDM 

VTGAVALAEFTHVIAAKYPVNVALHTDHCPKDKLDSWRPU^ISAQRVSKGGNPLFQSHMWDGSAV 

PIDENLAIAQELLKAAAAAKIILEIEIGWGGEEDGVANEINEKLYTSPEDFEKTIEALGAGEHGKYLLAA 

TFGNVHGVVKPGrWKLRPDIlAQGQQVAAAKLGLPADAKPFDFVFHGGSGSLKSEIEEALRYGWKM 

NVDTDTQYAFTRPIAGHMFTNYDGVLKVDGEVGVKKVYDPRSYLKKAEASMSQRWQACNDLHCA 
GKSLTH 

>Rv0405 pks6 TB.seq 485729:489934 MW:147615 SEQ ID NO: 166 

MTDGSVTADKLQKWFREYLSTHIECHPNEVSLDVPIRDLGLKSIDVLAIPGDLGDRFGFCIPDLAVWD 
NPSANDLIDSLLNQRSADSLRESHGHADRNTQ6RGSINEPVAVI6VGCRFPGDIDGPERLWDFLTEK 
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KCAITAYPDRGFTNAGTFAESGGFLKDVAGFDNRFFDIPPDEALRMDPQQRLLLEVSWEALEHAGIIP 

ESLRLSRTGVFVGVSSTDYVRLVSASAQQKSTIWDNTGGSSSIIANRISYFLDIQGPSIVIDTACSSSLV 

AVHLACRSLSTWDCDIALVGGTNVLISPEPWGGFREAGILSQTGCCHAFDKSADGMVRGEGCGVIVL 

QRLSDARLEGRRILAILTGSAVNQDGKSNGIMAPNPSAQIGVLENACKSARVDPLEIGYVEAHGTGTS 

LGDRIEAHALGMVFGRKRPGSGPLMIGSIKPNIGHLEGAAGIAGLIKAVLMVERGSLLPSGGFTEPNP 

AIPFTELGLRWDELQEWPWAGRPRRAGVSSFGFGGTNAHVIVEEAGSVGADTVSGRADVGGSGG 

GWAVWISGKTASALMQAGRLGRWRARPALDWDVGYSLVSTRSVFDHRAVWGQTRDELLAGL 

AGWAGRPEAGWCGVGKPAGKTAFVFAGQGSQWLGMGSELYAAYPVFAEALDAWDELDRHLRY 

PLRDVIWGHDQDLLNTTEFAQPALFAVEVALYRLU^SWGVRPGLVLGHSVGEUW\HVAGALCLPD 

AAMLVMRGRLMQALPAGGAMFAVQAREDEVAPMLGHDVSIAAVNGPASWISGAHDAVSAIADRL 

RGQGRRVHRLAVSHAFHSALMEPMIAEFTAVAAELSVGLPTIPVISNVTGQLVADDFASADYWARHIR 

AWRFGDSVRSAHCAGASRFIEVGPGGGLTSLIEASI^DAQIVSVPTLRKDRPEPVSVMTAAAQGFV 

SGMGLDWASVFSGYRPKRVELPTYAFQHQKFWLAPAPSVSDPTAAGQIGASDGGAELLASSGFAA 

RU\GRSADEQLAMIEWCEHAMVLGRDGMGLDAGQAFADSGFNSLSAVELRNRLTAWAmPA 

TAIFDHPTPTELAQYL1TQIDGHGSSAAAAANPAERIDALTDLFLQACDAGRDADGWKMVALASNTRE 

RMSSPVRNNVSKNVALUDGISDVWICIPTLWLSDQREYRDIANAMTGRHSVYSLTLPGFDSSDAL 

PQNADMI\mWSNAIIDWGGSCRFVLSGYSSGG\A^YALCSHLSVKHQRNPLGVALIDTYLPSQI^ 

PSMNEGFSPNDTGKGLSREVIRVARMLNRLTATRLTAMWMIFQAWEPGRSMAPVLNiVAKDRIAT 

VENLREERINRWRTAAAEAAYSVAEVPGDHFGMMSTSSEAIATEIHDWISGLVRGPHR 

>Rv0435c - ATPase of AAA-family TB.seq 522348:524531 MW:75315 SEQ ID NO:167 
VTHPDPARQLTLTARLNTSAVDSRRGWRLHPNAIAALGIREWDAVSLTGSRTTAAVAGLAAADTAV 
GWLLDDVTLSNAGLREGTEVIVSPVTWGARSWLSGSTI^^ 
LPRDLGPGTSTSAASRALAAAVGISWfTSELLTWGVDPDGP^ 

QVSISSPEIQIEEIJ<GAQPQAAKLTEWLKLALDEPHLLQTLGAGTNLGVLVSGPAGVGKATLVRA\/CD 

GRRLWLDGPEIGALAAGDRVKAVASAVQAVRHEGGVLLITDADALLPAAAEPVASLILSELRTAVATA 

GWLIATSARPDQLDARLRSPELCDRELGLPLPDAATRKSLLEALLNPVPTGDLNLDEIASRTPGFWA 

DLAALVREAALRAASRASADGRPPMLHQDDLLGALWIRPLSRSASDEVWGDWLODVGDMAAAK 

QALTEAVLWPLQHPDTFARLGVEPPRGVLLYGPPGGGKT^ 

GSSEKAVRELFRRARDSAPSLVFLDELDALAPRRGQSFDSGVSDRWAALLTELDGIDPLRDWMLG 
ATNRPDLIDPALLRPGRLERLVFVEPPDAAARREILRTAGKSIPLSSDVDLDEVAAGLDGYSAADCVAL 
LREAALTAMRRSIDAANVTAADLATARETVRASLDPLQVASLRKFGTKGDLRS 

>Rv0436c pssA CDP-diacylglycerol-serine o-phosphatldyltransferase TB.seq 524531 :525388 
MW:31219 SEQIDNO:168 

MIGKPRGRRGVNLQILPSAMWLSICAGLTAIKFALEHQPKAAMALIAAAAILDGLDGRVARILDAQSR 
MGAEIDSLADAVNFGVTPALVLWSMLSKWPVGVVWVLLYAVCWLRLARYNALQDD 
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FFVGMPAPAGAVSMIGLLALKMQFGEGVW^SGWFLSFVWTGTSILLVSG 

ALLAVL^ICAAMVLAPmiWVIIIAYMC^ 

YRPSMARLGLRKPGRRL 

5 >Rv0440 groEL 260 kD chaperonin 2 TB.seq 528606:530225 MW:56728 SEQ ID NO:169 

MAICTIAYDEEARRGLERGLNALADAVKVTLGPKGRNWLEKKWGAPTITNDGVSIAKEE 
GAELVKEVAKiaDDVAGDGTTTATV^^ 

EVETKEQIAATAAISAGDQSIGDLIAEAMDKVGNEGVITVEESNTFGLQLELTEGMRFDKGYISGYFVT 
DPERQEAVLEDPYIl±VSSKVSTVKDLLPLLEKVIGAGKPLLIIAEDVEGEALSTLWNKIRGTFKSVA\m 
10 APGFGDRRKAMLQDMAILTGGQVISEEVGLTLENADLSLLGKARKVNA/TKDETTIVEGAGDTDAIA 
VAQIRQEIENSDSDYDREKLQERU\KUVGGVAVIKAGAATEVELKERKHRIEDAVRNAKAAVEEGIVA 
GGGWLLQAAPTLDELKLEGDEATGANIVKVALEAPLKQIAFNSGLEPGWAEKVRNLPAGHGLNAQT 
GWEDLU^GVADPVKVTRSALQNMSIAGUFLTTEAWADKPEKEKASVPGGGDMGGMDF 

15 >Rv0482 murB TB.seq 570537:571643 MW:38522 SEQ ID NO:170 

MKRSGVGSLFAGAHIAEAVPI^PLmRVGPIARRVITCTSAEQWAALRHLDSAAKTGADRPLWAG 
GSNLVIAENLTDLTWRLANSGITIDGNLVRAEAGAVFDDNA/VRAIEQGLGGLECLSGIPGSAGATPVQ 
NVGAYGAEVSDTITRVRLLDRCTGEVRWVSARDLRFGYRTSVLKHADGUVVPTVVLEVEF 
SAPLRYGEUAALNATSGERADPQAVREAVL^LRARKGMVLDPTDHDTW^ 

20 RLAGDAATRKDGPVPHYPAPDGVKLAAGWLVERAGFGKGYPDAGAAPCRLSTKHALALTNRGGAT 
AEDWTLARAVRDGVHDVFGITLKPEPVLIGCML 

>Rv0483 -TB.seq 571708:573060 MW:47859 SEQ ID NO:171 
WIRVLFRPVSLIPVNNSSTPQSQGPISRRLALTALGFGVI^ 
25 RPADSAADWPIAPISVEVGDGWFQRVALTNSAGKWAGAYSRDRTIYTITEPLGYDT7YTW 
GHDGKAWVAGKFTWAPWINAGFQLADGQWGM^ 

WAWLPDEAQGARVHWRPREYYPAGTTVDVDAKLYGLPFGDGAYGAQDMSLHFQIGRRQWKAEV 
SSHRIQWTDAGVIMDFPCSYGEADLARNVTRNGIHVVTEKYSDFYMSNPMGYSHIHE 
NGEFIHANPMSAGAQGNSNVTNGCINLSTENAEQYYRSAWGDPVEVTGSSIQLSYADGDIWDWAV 
30 DWDTWVSMSALPPPAAKPAATQIPVTAPVTPSDAPTPSGTPTTTNGPGG 

>Rv0489 gprn phosphoglycerate mutase I TB.seq 578424:579170 MW:27217 SEQ ID NO: 172 
MANTGSLVLLRHGESDWNALNLFTGWVDVGLTDKGQAEAVRSGELIAEHDLLPDVLYTSLLRF^I^ 
AHL\LDSADRLWIPVRRSWRLNERHYGALQGLDKAETKARYGEEQFMAWRRSYDTPPPPIERGSQ 
35 FSQDADPRYADIGGGPLTECUVDWARFLPYFTDVIVGDLRVGKWLIVAHGNSLRALVKHLDQMSDD 
EIVGLNIPTGIPLRYDLDSAMRPLVRGGTYLDPEAAAAGAAAVAGQGRG 
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>Rv0490 senX 3sensor histidine kinase TB.seq 579347:580576 MW:44794 SEQ ID N0:1 73 
VWFSALLLAGVLSALAUVGGAVGMRLTSRWEQRQRVATEWSGITVSQMLCKJimMPLGAAWD 
THRD\^LNERAKELGLVRDRQLD[X^VVRAARQALGGEDV6FDLSPRKRSATGRSGLSVHGHARL 
LSEEDRRFAWFVHDQSDYARMEAARRDFVANVSHELKTPVGAMALLAEALLASADDSETVRRFAE 
6 KVLIEANRLGDMVAELIELSRLQGAERLPNMTDVDVDTIVSEAISRHKVAADNADIEVRTDAPSNLRVL 
GDQTLLVTALANLVSNAIAYSPRGSLVSISRRRRGANIEIAVTDRGIGIAPEDQERVFERFFRGDKARS 
RATGGSGLGLAIVKHVAANHDGTIRVWSKPGTGSTFTLALPALIEAYHDDERPEQAREPELRSNRSQ 
REEELSR 

>Rv0500 proC pyrroline-5-carboxylate reductase TB.seq 590081 :590965 MW:301 72 
SEQ ID NO:174 

MLFGMARIAIIGGGSIGEALLSGLLRAGRQVKDLWAERMPDRANYLAQTYSVLVTSAADAVENATFV 
WAVKPADVEPVIADLANATAAAENDSAEQVFVTWAGITIAYFESKLPAGTPWRAMPNAAALVGAG 
WALAKGRFVTPQQLEEVSALFDAVGGVLTVPESQLDAVTAVSGSGPAYFFLLVEALVDAGVGVGLS 
RQVATDUVAQTMAGSAAMLLERMEQDQGGANGELMGLRVDLTASRLRAAVTSPGGTTAAALRELE 
RGGFRMAVDAAVQAAKSRSEQLRITPE 

>Rv0528 - TB.seq 618303:619889 MW:57132 SEQ ID NO:175 

I^RSLTSMGTAL\^LFLLAIJ\AIPGALLPQRGLNAAKVDDYU^HPLIGPWLDELQAFDVFSSFWFTA 
lYVLLFVSLVGCI^PRTIEHARSLRATPVAAPRNLARLPKHAHARLAGEPAALAATITGRLRGWRSITR 
QQGDSVEVSAEKGYLREFGNLVFHFALLGLLVAVAVGKLFGYEGNVIVIADGGPGFCSASPAAFDSF 
RAGNTVDGTSLHPICVRVNNFQAHYLPSGQATSFAADIDYQADPATADLIANSWRPYRLQVNHPLRV 
GGDRWLQGHGYAPTFTVTFPDGQTRTSTVQWRPDNPQTLLSAGWRIDPPAGSYPNPDERRKHQI 
AIQGLLAPTEQLDGTLLSSRFPALNAPAVAIDIYRGDTGLDSGRPQSLFTLDHRLIEQGRLVKEKRVNL 
RAGQQVRIDQGPAAGTWRFDGAVPFVNLQVSHDPGQSVVVLVFAJTMMAGLLVSLLVRRRRVWARI 
TFTTAGTVNVELGGLTRTDNSGWGAEFERLTGRLLAGFEARSPDMAEAAAGTGRDVD 

>Rv0667 rpoB [beta] subunit of RNA polymerase TB.seq 759805:763320 MW:129220 
SEQ ID NO:176 

30 LADSRQSKTAASPSPSRPQSSSNNSVPGAPNRVSFAKLREPLEVPGLLDVQTDSFEWLIGSPRWRE 
SAAERGDVNPVGGLEEVLYELSPIEDFSGSMSLSFSDPRFDDVKAFVDECKDKDMTYAAPLFVTAEF 
INNNTGEIKSQWFMGDFPM^EKGTFIINGTERVWSQLVRSPGVYFDETIDKSTDICrLHSVKVIPSR 
GAWLEFDVDKRDWGVRIDRKRRQPVmLKALGWTSEQIVERFGFSEIMRSTLEKDNTVGTDEALL 
DIYRKLRPGEPPTKESAQTLLENLFFKEKRYDLARVGRYKVNKKLGLHVGEPITSSTLTEEDWATIEY 

35 LVRLHEGQTTMWPGGVEVPVETDDIDHFGNRRLRTVGELIQNQIRVGMSRMERWRERMTTQDVE 
AITPQTLINIRPWAAIKEFFGTSQLSQFMDQNNPLSGLTHKRRLSALGPGGLSRERAGLEVRDVHPS 
HYGRMCPIETPEGPNIGUGSLSWARVNPFGFIETPYRKWDGWSDEIWLTADEEDRHWAQANS 
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PIDADGRFVEPRVLVRRKAGEVEYVPSSEVDYMDVSPRQMVSVATAMIPFLEHDDANRALMGANMQ 

RQAVPLVRSEAPLVGTGMELRAAI DAGDWVAEESGVI EEVSADYITVMHDNGTRRTYRMRKFARSN 

HGTCANQCPIVDAGDRVEAGQVIADGPCTDDGEMAL6KNLLVAIMPWEGHNYEDAIILSNRLVEEDV 

LTSIHIEEHEIDARDTKLGAEEITRDIPNISDEVLADLDERGIVRIGAEVRDGDILVGKVTPKGETELTPE 

ERU.RAIFGEKAREVRDTSLKVPHGESGKVIGIRVFSREDEDELPAGVNELVRVYVAQKRKISDGDKL 

AGRHGNKGVIGKILPVEDMPFLADGTPVDIILNTHGVPRRMNIGQILETHLGWCAHSGWKVDAAKGV 

PDWAARLPDELLEAQPNAIVSTFVFDGAQEAELQGLLSCTLPNRDGDVLVDADGKAMLFDGRSGEP 

FPYPVTVGYMYIMKLHHLVDDKIHARSTGPYSMITQQPLGGKAQFGGQRFGEMECWAMQAYGAAY 

TLQELLTIKSDDTVGRVKVYEAIVKGENIPEPGIPESFKVLLKELQSLCLNVEVLSSDGAAIELREGEDE 

DLERAAANLGINLSRNESASVEDLA 

>Rv0668 rpoC [beta]' subunit of RNA polymerase TB.seq 763368:767315 MW:146740 
SEQIDNO:177 

VLDVNFFDELRIGLATAEDIRQWSYGEVKKPETINYRTLKPEKDGLFCEKIFGPTRDWECYCGKYKRV 

RFKGIICERCGVEVTRAKVRRERMGHIEI^APNn'HIWYFKGVPSRLGYLLDLAPKDLEKIIYFAAYVITS 

VDEEMRHNELSTLEAEMAVERKAVEDQRDGELEARAQKLEADLAELEAEGAKADARRKVRDGGER 

EMRQIRDRAQRELDRLEDIWSTFTKLAPKQLIVDENLYRELVDRYGEYFTGAMGAESIQKLIENFDIDA 

EAESLRDVIRNGKGQKKLRALKRLKWAAFQQSGNSPMGMVLDAVPVIPPELRPMVQLDGGRFATS 

DLNDLYRRVINRNNRLKRLIDLGAPEIIVNNEKRMLQESVDALFDNGRRGRPVTGPGNRPLKSLSDLL 

KGKQGRFRQNLLGKRVDYSGRSVIWGPQLKLHQCGLPKLMALELFKPFVMKRUVDLNHAQNIKSAK 

RMVERQRPQVWDVLEEVIAEHPVLLNRAPTLHRLGIQAFEPMLVEGKAIQLHPLVCEAFNADFDGDQ 

MAVHLPLSAEAQAEARILMLSSNNILSPASGRPLAMPRLDMVTGLYYLTTEVPGDTGEYQPASGDHP 

ETGVYSSPAEAIMAADRGVLSVRAKIKVRLTQLRPPVEIEAELFGHSGWQPGDAWMAETTLGRVMF 

NELLPLGYPFVNKQMHKKVQAAIINDLAERYPMlWAQWDKLKDAGFYWATRSGVTVSMADVLVPP 

RKKEILDHYEERADKVEKQFQRGALNHDERNEALVEIWKEATDEVGQALREHYPDDNPIITIVDSGAT 

GNFTQTRTUGMKGLVTNPKGEFIPRPVKSSFREGLTVLEYFINTHGARKGLADTALRTADSGYLTRR 

LVDVSQDVIVREHDCQTERGIWELAERAPDGTLIRDPYIETSAYARTLGTDAVDEAGNVIVERGQDL 

GDPEIDALLAAGITQVKVRSVLTCATSTGVCATCYGRSMATGKLVDIGEAVGIVAAQSIGEPGTQLTM 

RTFHQGGVGEDITGGLPRVQELFEARVPRGKAPIADVTGRVRLEDGERFYKITIVPDDGGEEWYDKI 

SKRQRLRVFKHEDGSERVLSDGDHVEVGQQLMEGSADPHEVLRVQGPREVQIHLVREVQEVYRAQ 

GVSIHDKHIEVIVRQMLRRVniDSGSTEFLPGSLIDRAEFEAENRRWAEGGEPAAGRPVLMGITKAS 

LATDSWLSAASFQETTRVLTDAAINCRSDKLNGLKENVIIGKLIPAGTGINRYRNIAVQPTEEARAAAYT 

IPSYEDQYYSPDFGAATGAAVPLDDYGYSDYR 

>Rv0711 atsA TB.seq 806333:808693 MW:8621 6 SEQ ID NO: 178 

MAPEATEAFNGTIELDIRDSEPDWGPYAAPVAPEHSPNILYLVWDDVGIATWDCFGGLVEMPAMTRV 
AERGVRLSQFHTTALCSPTRASLLTGRNATTVGMATIEEFTDGFPNCNGRIPADTALLPEVLAEHGYN 
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TYCVGKWHLTPLEESNMASTKRHWPTSRGFERFY6FLGGETDQWYPDLVYDNHPVSPPGTPEGG 
YHLSKDIADKTIEFIRDAKVIAPDKPWFSYVCPGAGHAPHHVFKEWADRYAGRFDMGYERYREIVLE 
RQKALGIVPPDTELSPINPYLDVPGPNGETWPLQDTVRPWDSLSDEEKKI.FCRMAEVFAGFLSYTDA 
QIGRILDYLEESGQLDNTIIWISDNGASGEGGPNGSVNEGKFFNGYIDTVAESMKLFDHLGGPQTYN 

5 HYPI6WAMAFNTPYKLFKRYASHEGGIADPAIISWPNGIAAHGEIRDNYVNVSDITPTVYDLLGMTPP 
GTWGIPQKPMDGVSFIAALADPAADTGKTTQFYTMLGTRGIVVHEGWFANTIHAATPAGWSNFNAD 
RWELFHIAADRSQCHDLAAEHPDKLEELKALWFSEAAKYNGLPLADLNLLETMTRSRPYLVSERASY 
VYYPDCADV6IGAAVEIRGRSFAVLADVTIDTTGAEGVLFKHGGAHGGHVLFVRDGRLHYVYNFLGE 
RQQLVSSSGPVPSGRHLLGVRYLRTGTVPNSHTPVGDLELFFDENLVGALTNVLTHPGTFGLAGAAI 

10 SVGRNGGSAVSSHYEAPFAFTGGTITQVTVDVSGRPFEDVESDLALAFSRD 

>Rv0764c - lanosterol 14-demethylase cytochrome P450 TB.seq 856683:858Q35 MW:50879 
SEQ ID NO:179 

MSAVALPRVSGGHDEHGHLEEFRTDPIGLMQRVRDECGDVGTFQLAGKQWLLSGSHANEFFFRA 
15 GDDDLDQAKAYPFMTPIFGEGWFDASPERRKEMLHNAALRGEQMKGHAATIEDQVRRMIADWGE 
AGEIDLLDFFAELTIYTSSACLIGKKFRDQLDGRFAKLYHELERGTDPLAYVDPYLPIESFRRRDEARN 
GLVALVADIMNGRIANPPTDKSDRDMLDVLIAVKAETGTPRFSADEITGMFISMMFAGHHTSSGTASW 
TLIELMRHRDAYAAVIDELDELYGDGRSVSFHALRQIPQLENVLKETLRLHPPLIILMRVAKGEFEVQG 
HRIHEGDLVAASPAISNRIPEDFPDPHDFVPARYEQPRQEDLLNRWTWIPFGAGRHRCVGAAFAIMQI 
20 KAIFSVLLREYEFEMAQPPESYRNDHSKMWQLAQPACVRYRRRTGV 

>Rv0861c - DNA helicase TB.seq 958524:960149 MW:59773 SEQ ID NO:180 

VQSDKTVLLEVDHEI^GAARMIAPFAELERAPEHVHTYRITPLALWNARAAGHDAEQWDALVSYS 

RYAVPQPLLVDIVDTMARYGRLQLVKNPAHGLTLVSLDRAVLEEVLRNKKIAPMLGARIDDDTVWHP 

25 SERGRVKQUJ-KIGWPAEDUGYVDGEAHPISLHQEGWQLRDYQRUWDSFWAGGSGVmPCGA 
GKTLVGAAAMAKAGATTLILVTNIVAARQWKRELVARTSLTENEIGEFSGERKEIRPVTISTYQMITRR 
TKGEYRHLELFDSRDWGLIIYDEVHLLPAPVFRMTADLQSKRRLGLTATLIREDGREGDVFSLIGPKR 
YDAPWKDIEAQGWIAPAECVEVRWMTDSERMMYATAEPEERYRICSWHTKIAWKSILAKHPDEQ 
TLVIGAYLDQLDELGAELGAPVIQGSTRTSEREALFDAFRRGEVATLWSKVANFSIDLPEAAVAVQVS 

30 GTFGSRQEEAQRLGRILRPKADGGGAIFYSWARDSLDAEYAAHRQRFLAEQGYGYIIRDADDLLGP 
Al 

>Rv0904c accD3 TB.seq 1006694:1008178 MW:51741 SEQ ID NO:181 

VSRITTDQLRHAVLDRGSFVSWDSEPLAVPVADSYARELAAARAATGADESVQTGEGRVFGRRVAV 
35 VACEFDFLGGSIGVAAAERITAAVERATAERLPLLASPSSGGTRMQEGTVAFLQMVKIAAAIQLHNQA 
RLPYLVYLRHPTTGGVFASWGSLGHLTVAEPGALIGFLGPRVYELLYGDPFPSGVQTAENLRRHGIID 
GWALDRLRPMLDRAUTVLIDAPEPLPAPQTPAPVPDVPTWDSWASRRPDRPGVRQLLRHGATDR 
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VLLSGTDQGEMTTLLALARFGGQPTWLGQQRAV6GGGSTVGPAALREARRGMALAAELCLPLVL 
VIDMGPALSAMEQGGUGQIAHCL^LmDTPWSILLG(MSGGPALAMLPADRVLAALHGWLAP 
LPPEGASAIVFRDTAHAAELAAAQGIRSADLLKSGIVDTIVPEYPDAADEPIEFALRLSNAIAAEVHALR 
KIPAPERLATRLQRYRRIGLPRD 

>Rv0983 - TB.seq 1099064:1100455 MW:46454 SEQ ID NO:182 

MAKLARWGLVQEEQPSDH4TNHPRYSPPPQQPGTPGYAQGQQQTYSQQFDWRYPPSPPPQPTQY 

RQPYEALGGTRPGLIPGVIPTMTPPPGMVRQRPRAGMLAIGAVTIAWSAGIGGAAASLVGFNRAPA 

GPSGGPVAASAAPSIPAANMPPGSVEQVAAKWPSWMLETDLGRQSEEGSGIILSAJEGLILTNNHVI 

AAAAKPPLGSPPPKTTVTFSDGRTAPFTWGADPTSDIAWRVQGVSGLTPISLGSSSDLRVGQPVLA 

IGSPLGLEGTVTTGIVSALNRPVSTTGEAGNQNTVLDAJQTDAAINPGNSGGALVNMNAQLVGVNSAI 

ATUGADSADAQSGSIGLGFAIPVDQAKRIADELISTGKASHASLGVQVTNDKDTLGAKIVEWAGGAA 

ANAGVPKGWVTKVDDRPINSADALVAAVRSKAPGATVALTFQDPSGGSRTVQVTLGKAEQ 

>Rv1008 - Similar to E.coli protein YcfH TB seq 1127087:1127878 MW:29066 SEQ ID NO:183 

LVDAHTHLDACGARDADWRSLVERAAAAGWA\AnVADDLESARVWTRAAEWDRRWAAVALHPT 

RADALTDAARAELERLVAHPRWAVGETGIDMYVVPGRLDGCAEPHVQREAFAWHIDLAKRTGKPLM 

IHNRQADRDVLDVLRAEGAPDWILHCFSSDAAMARTCVDAGWLLSLSGTVSFRTARELREAVPLMP 

VEQLLVETDAPYLTPHPHRGLANEPYCLPYTVRALAELVNRRPEEVALITTSNARRAYGLGWMRQ 

>Rv1 009 - lipoprotein, similar to various other MTB proteins TB.seq 1 128089:1 129174 MW:38079 
SEQ ID NO:184 

MLRLWGALLLVU\FAGGYAVAACKTWLTVDGTAMRVTTMKSRVIDIVEENGFSVDDRDDLYPAAG 

VQVHDADTIVLRRSRPLQISLDGHDAKQVV\nTASWDEALAQLAMTDTAPAAASRASRVPLSGMALP 

WSAKWQLNDGGLVRWHLPAPNVAGLLSAAGVPLLQSDhiWPAATAPIVEGMQIQVTRNRIKKVTE 

RLPLPPNARRVEDPEMNMSREWEDPGVPGTQDVTFAVAEVNGVETGRLPVANWvTPAHEAWR 

VGTKPGTEVPPVIDGSIWDAIAGCEAGGNWAINTGNGYYGGVQFDQGTWEANGGLRYAPRADLAT 

REEQIAVAEvTRLRQGWGAWPVCAARAGAR 

>Rv1010 ksgA 16SrRNAdimethyltransferase TB.seq 1129150:1130100 MW:34647 
SEQ ID NO:185 

MCCTSGCALTIRLLGRTEIRRLAKELDFRPRKSLGQNFVHDANTVRRWAASGVSRSDLVLEVGPGL 

GSLTLALLDRGATVTAVEIDPLLASRLQQWAEHSHSEvlHRLTWNRDVU\LRREDLAAAPTAvVANL 

PYNVAVPALLHLLVEFPSIRWTVMVQAEVAERLAAEPGSKEYGVPSVKLRFFGRVRRCGMVSPTVF 

WPIPRVYSGLVRIDRYETSPWPTDDAFRRRVFELVDIAFAQRRKTSRNAFVQWAGSGSESANRLLAA 

SIDPARRGETLSIDDFVRLLRRSGGSDEATSTGRDARAPDISGHASAS 
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>Rv1011 - Similar to Ecoli protein YcbH TB.seq 1130189:1131106 MW:31350 
SEQ ID NO: 186 

VPTGSVWRVPGKVNLYLAVGDRRED^ 

ERNU^WQAAELMAEHVGRAPDVSIMIDKSIPVAGGMAGGSADAMVLVAMNSLWELNVPRRDLRML 
MRLGSDVPFALHGGTALGTGRGEEU\TVLSRNTPHV\MJVFADSGLLTSAWNELDRLREVGDP 
GEPGPVUVAUWSDPIXILAPLLGNEMQAAA^ 
ASSAIDVGAQLSGAGVCRTVRVATGPVPGARWSAPTEV 

>Rv1 106c - cholesterol dehydrogenase TB.seq 1232845:1233954 MW:40743 SEQ ID NO:187 

MLRRMGDASLTTELGRVLWGGAGFVGANLNmLLDRGHWVRSFDRAPSLLPAHPQLEVLQGDITD 

ADVCAMVDGIDTIFHTAAIIELMGGASVTDEYRQRSFAVNVGGTENLLHAGQRAGVQRFVYTSSNS 

WMGGQNIAGGDETLPYTDRFNDLYTETKWAERFVl^QNGVDGMLTCAIRPSGIWGNGDQ 

LFESVLKGHVKVLVGRKSARLDNSYVHNLIHGFIU^AAHLVPDGTAPGQAYFIND 

EACGQRWPKMRISGPAVRVVVMTGWQRLHFRFGFPAPLLEPLAVERLYLDNYFSIAKARRDLGYEPL 

FTTQQALTECLPYYVSLFEQMKNEARAEKTAATVKP 

>Rv1110 lytB2 TB.seq 1236183:1237187 MW:36298 SEQ ID NO:188 

MVPWDMGIPGASVSSRSVADRPNRKRVLLAEPRGYCAGVDRAVEWERALQM 

NRHWDTUVKAGAVFVEETEQVPEGAIWFSAHGVAPTVHVSASERNLQVIDATCPLW 

FARDDYDILLIGHEGHEEWGTAGEAPDHVQLVDGVDAVDQVWRDEDIWVWLSQmSVDETMEIV 

GRLRRRFPKLQDPPSDDICYATQNRQVAVKAMAPECELVIWGSRNSSNSVRLVEVALGAGARAAH 

LVDWADDIDSAWLDGVTWGVTSGASVPEVLVRGVLER^ 

PR 

>Rv1216c - TB.seq 1359473:1360144 MW:24863 SEQ ID NO:189 
MHIGLKIFIWGVLGLWFGALLFGPAGTFDYW^ 

LAEGRTIQKFIVIGAFLGFFAMI^LSACDHRYGWSSVPMVCVIGDVLVMTGLGIAMLWIQNRYMS 

WRVEAGQIUVSDGLYKIVRHPMYAGNWMMTGIPLALGSYWAMFILVPGTLVLVFRI 

SGYREYRQLVRYRLVPYVW 

>Rv1223 htrA TB.seq 1365810:1367456 MW:56547 SEQ ID NO:190 

VSHLSQRMAGLLRVHGEWSRSVDTRVDTDNAMPARFSAQIQNEDEVTSDQGNNGGPNGGGRLAP 

R pVFRPPVDPASRQ^GRPSGVQGSFVAERVRPQKYQDQSDFTPNDQLADPVLQEAFGRPFAGAE 

SLQRHPIDAGAUVAEKDGAGPDEPDDPWRDPAAAAALGTPALAAPAPHGALAGSGKLGVRDVLFGG 

KVSYU\LGILVAIALVIGGIGGVIGRKTAEWDAFTTSKVTLSTTGNAQ 

SVSDQEGMQGSGVIVDGRGYI\n"NNHVISEAANNPSQFKT^ 

VDNVDNLWARLGDSSKVRVGDEVUVGAPLGLRST^C^IVSALHRPVPLSGEGSDTDTVIDAIQTD 
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ASINHGNSGGPLIDMDAQVIGINTAGKSLSDSASGLGFAIPVNEMKLVANSLIKDGKIVHPTLGISTRSV 

SNAIASGAQVANVKAGSPAQKGGILENDVIVKVGNRAVADSDEFWAVRQLAIGQDAPIEWREGRH 

VTLTVKPDPDST 

>Rv1224 - TB.seq 1367461:1367853 MW:14083 SEQ ID N0:191 

VFANIGWWEMLVLVMVGLWLGPERLPGAIRWAASALRQARDYLSGVTSQLREDIGPEFDDLRGHL 
GELQKLRGMTPRAALTKHLLDGDDSLFTGDFDRPTPKKPDAAGSAGPDATEQIGAGPIPFDSDAT 

>Rv1229c mrp similar to MRP/NBP35 ATP-binding proteins TB.seq 1371778:1372947 MW:41064 
SEQ ID NO:192 

MPSRLHSAVMSGTRDGDLNAAIRTALGKVIDPELRRPITELGMVKSIDTGPDGSVHVEIYLTIAGCPKK 

SEITERVTRAVADVPGTSAVRVSLDVMSDEQRTELRKQLRGDTREPVIPFAQPDSLTRVYAVASGKG 

GVGKSTVWNLAAAMAVRGLSIGVLDADIHGHSIPRMMGTTDRPTQVESMILPPIAHQVKVISIAQFTQ 

GNTPWWRGPMLHRALQQFUVDVYWGDLDVLLLDLPPGTGDVAISVAQLI^ 

VAERAGSIALQTRQRIVGWENMSGLTLPDGTTMQVFGEGGGRLVAERLSRAVGADVPLLGQIPLDP 

ALVAAGDSGVPLVLSSPDSAIGKELHSIADGLSTRRRGLAGMSLGLDPTRR 

>Rv1239c corA magnesium and cobalt transport protein TB.seq 1381943:1383040 MW:41470 
SEQIDNO:193 

VFPGFDALPEVLRPVARPQPPNAHPVAQPPAQALVDCGNAVCGQRLPGKYTYAAALREVREIELTG 

QEAFVWIGLHEPDENQMQDVADVFGLHPLAVEDAVHAHQRPKLERYDETLFLVLKTVNYVPHESW 

LAREIVKTGEIMIFVGKDFWWRHGEHGGLSEVRKRMDADPEHLRLGPYAVMHAIADYWDHYLEVT 

NLMETDIDSIEEVAFAPGRKLDIEPIYLLKREWELRRCVNPLSTAFQRMQTESKDLISKEVRRYLRDV 

ADHQTEMDQIASYDDMLNSLVQAALARVGMQQNMDMRKISAWAGIIAVPTMIAGIYGMNFHFMPEL 

DSRWGYPTVIGGMVLICLFLYHVFRNRNWL 

>Rv1279 -TB.seq 1430060:1431643 MW:57332 SEQ ID NO: 194 

MDTQSDYNAA/GTGSAGAWASRLSTDPATTWALEAGPRDKNRFIGVPAAFSKLFRSEIDWDYLTEP 

QPELDGREIYWPRGKVLGGSSSMNAMMWVRGFASDYDEWAARAGPRWSYADVLGYFRRIENVTA 

AWHFVSGDDSGWGPLHISRQRSPRSVn^AAWLAAARECGFAAARPNSPRPEGFCETWTQRRGAR 

FSTADAYLKPAMRRKNLRVLTGATATRWIDGDRAVGVEYQSDGQTRIVYARREWLCAGAVNSPQL 

LMLSGIGDRDHLAEHDIDTVYHAPEVGCNLLDHLVTVLGFDVEKDSLFAAEKPGQLISYLLRRRGMLT 

SNVGEAYGFVRSRPELKLPDLELIFAPAPFYDEALVPPAGHGWFGPILVAPQSRGQITLRSADPHAK 

PVIEPRYLSDLGGVDRAAMMAGLRICARIAQARPLRDLLGSIARPRNSTELDEATUELALATCSHTLYH 

PMGTCRMGSDEASWDPQLRVRGVDGLRVADASVMPSTVRGHTHAPSVLIGEKAADLIRS 

>Rv1294 thrA homoserine dehydrogenase TB.seq 1449373:1450695 MW:45522 SEQ ID NO:195 

VPGDEKPVGVAVLGLGNVGSEWRIIENSAEDLAARVGAPLVLRGIGVRRVTTDRGVPIELLTDDIEEL 

VAREDVDIWEVMGPVEPSRKAILGALERGKSWTANKALLATSTGELAQAAESAHVDLYFEAAVAGA 
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IPVIRPLTQSLAGDTVLRVAGIVNGTTNYILSAMDSTGADYASALADASALGYAEADPTADVEGYDAA 
AKAAILASIAFHTRWADDVYREGITKVTPADFGSAHALGCTIKLLSICERITTDEGSQRVSARVYPALV 
PLSHPLAAVNGAFNAVWEAEAAGRLMFYGQGAGGAPTASAVTGDLVMAARNRVLGSRGPRESKY 
AQLPVAPMGFIETRYYVSMNVADKPGVLSAVAAEFAKREVSIAEVRQEGWDEGGRRVGARIVWTH 
LATDAALSETVDALDDLDWQGVSSVI RLEGTGL 

>Rv1323 fadM acetyl-CoA C-acetyKransferase (aka thiL) TB.seq 1485860:1487026 MW:40049 
SEQ ID NO: 196 

VIVAGARTPIGKLMGSLKDFSASELGAIAIKGALEKANVPASLVEYVIMGQVLTAGAGQMPARQAAVA 

AGIGWDVPALTINKMCLSGIDAIAIADQLIRAREFDWVAGGQESMTKAPHLLMNSRSGYKYGDVTVL 

DHMAYDGLHDVFTDQPMGALTEQRNDVDMFTRSEQDEYAAASHQKAAAAVVKDGVFADEVIPVNIP 

QRTGDPLQFTEDEGIRANTTAAALAGLKPAFRGDGTITAGSASQISDGAAAWVMNQEKAQELGLTW 

LAEIGAHGWAGPDSTLQSQPANAINKALDREGISVDQLDWEINEAFAAVALASIRELGLNPQIVNVN 

GGAIAVGHPLGMSGTRITLHAALQLARRGSGVGVAALCGAGGQGDALILRAG 

>Rv1389 gmk putative guanylate kinase TB.seq 1564399:1565022 MW:22064 SEQ ID NO:197 

VSVGEGPDTKPTARGQPAAVGRVWLSGPSAVGKSTWRCLRERIPNLHFSVSATTRAPRPGEVDG 

VDYHFIDPTRFQQLIDQGELLEWAEIHGGLHRSGTLAQPVRAAMTGVPVLIEVDI^GARAIKKTMPE 

AVWFlT^PSWQDLQARLlbRGTETADVIQRRLDTARIELAAQGDFDKVWNRRLESACAELVSLLVG 

TAPGSP 

>Rv1407 fmu similar to Fmu protein TB.seq 1583099:1584469 MW:48494 SEQ ID NO:1 98 

MTPRSRGPRRRPLDPARRAAFETLRAVSARDAYANLVLPALLAQRGIGGRDAAFATELTYGTCRAR 

GLLDAVIGAAAERSPQAIDPVLLDLLRLGTYQLLRTRVDAHAAVSTTVEQAGIEFDSARAGFVNGVLR 

TIAGRDERSWVGEUMPDAQNDPIGHAAFVHAHPRVWAQAFADALGAAVGELEAVLASDDERPAVHLA 

ARPGVLTAGELARAVRGTVGRYSPFAVYLPRGDPGRLAPVRDGQALVQDEGSQLVARALTLAPVDG 

DTGRWLDLCAGPGGKTALLAGLGLQCAARVTAVEPSPHRADLVAQNTRGLPVELLRVDGRHTDLDP 

GFDRVLVDAPCTGLGALRRRPEARWRRQPADVAALAKLQRELLSAAIALTRPGGWLYATCSPHLAE 

TVGAVADALRRHPVHALDTRPLFEPVIAGLGEGPHVQLWPHRHGTDAMFAAALRRLT 

>Rv1409 ribG riboflavin biosynthesis TB.seq 1585192:1586208 MW:35367 SEQ ID NO:199 

MNVEQVKSIDEAMGLAIEHSYQVKGTTYPKPPVGAVIVDPNGRIVGAGGTEPAGGDHAEWALRRAG 

GLMGAIWVTMEPCNHYGKTPPCVNALIEARVGTWYAVADPNGIAGGGAGRLSAAGLQVRSGVLA 

EQVMGPLREWLHKQRTGLPHVTWKYATSIDGRSAAADGSSQWISSEAARLDLHRRRAIADAILVGT 

GWU\DDPALTARI^DGSU^QQPLRWVGKRDIPPEARVLNDEARTMMIRTHEPMEVLRALSDRTD 

VLLEGGPTLAGAFLRAGAINRILAYVAPILLGGPVTAVDDVGVSNITNALRWQFDSVEKVGPDLLLSLV 

AR 
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>Rv1440 secG TB.seq 1617715:1618065 MW:12140 SEQ ID NO:200 

VAGV^AAVSARLKADEARRPGFYAAGSGPLPQVRGSTLPVMELALQITLIVTSVLWLLVLLHRAKGG 
GLSTLFGGGVQSSLSGSTWEKNLDRLTLFVTGIWLVSIIGVALLIKYR 

5 >Rv1484 lnhATB.seq 1674200:1675006 MW:28529 SEQ ID NO:201 

MTGLLDGKRILVSGIITDSSIAFHIARVAQEQGAQLVLTGFDRLRLIQRITDRLPAKAPLLELDVQNEEH 
LASLAGRVTEAIGAGNKLDGWHSIGFMPQTGMGINPFFDAPYADVSKGIHISAYSYASMAIO^LLPIM 
NPGGSIVGMDFDPSRAMPAYNWMTVAKSALESVNRFVAREAGKYGVRSNLVAAGPIRTLAMSAIVG 
GALGEEAGAQIQLLEEGWDQRAPIGWNMKDATPVAKTVCALLSDWLPATTGDIIYADGGAHTQLL 

10 

>Rv1617 pykA pyruvate kinase T&seq 1816187:1817602 MW:50668 SEQ ID NO:202 
VTRRGKIVCTLGPATQRDDLVRALVEAGMDVARMNFSHGDYDDHKVAYERVRVASDATGRAVGVL 
ADLQGPKIRLGRFASGATHWAEGETVRIWGACEGSHDRVSTTYKRLAQDAVAGDRVLVDDGKVAL 
WDAVEGDDWCTWEGGPVSDNKGISLPGMNvTAPALSEKDIEDLTFALNLGX^MVALSFVRSPAD 
1 5 VELVHEVMDRIGRRVPVIAKLEKPEAIDNLEAIVLAFDAVMVARGDLGVELPLEEVPLVQKRAIQMARE 
NAKPVIVATQMLDSMIENSRPTRAEASDVANAVLDGADALMLSGETSVGKYPLAAVRTMSRIICAVEE 
NSTAAPPLTWIPRTKRGVISYAARDIGERLDAKALVAFTQSGDTVRRLARLHTPLPLLAFTAWPEVRS 
QLAMTWGTETFIVPKMQSTDGMIRQVDKSLLELARYKRGDLWIVAGAPPGTVGSTNLIHVHRIGEDD 
V 

20 

>Rv1630 rpsA 30S ribosomal protein S1 TB.seq 1833540:1834982 MW:53203 SEQ ID NO:203 
MPSPTvTSPQVAVNDIGSSEDFLAAIDKTIKYFNDGDIVEGTIVKVDRDEVLLDIGYKTEGVIPARELSIK 
HDVDPNEWSVGDEVEALVLTKEDKEGRLILSKKRAQYERAWGTIEALKEKDEAVKGTVIEWKGGLI 
LDIGLRGFLPASLVEMRRVRDLQPYIGKEIEAKIIELDKNRNNWLSRRAWLEQTQSEVRSEFLNNLQK 
25 GTIRKGWSSIVNFGAFVDLGGVDGLVHVSELSWKHIDHPSEWQVGDEVTVEVLDVDMDRERVSLS 
LKATQEDPWRHFARTHAiGQIVPGKvTKLVPFGAFVRVEEGIEGLVHISEU^RWEVPDQWAVGDD 
AMVKVIDIDLERRRISLSLKQANEDYTEEFDPAKYGMADSYDEQGNYIFPEGFDAETNEWLEGFEKQ 
RAEWEARYAEAERRHKMHTAQMEKFAAAEAAGRGADDQSSASSAPSEKTAGGSLASDAQLAALRE 
KLAGSA 

30 

>Rv1631 -TB.seq 1835011:1836231 MW:44669 SEQ ID NO:204 

MLRIGLTGGIGAGKSIXSTTFSQCGGIWDGDVLAREWQPGTEGLASLVDAFGRDILLADGALDRQA 
I^AKAFRDDESRGVLNGIVHPLVARRRSEIIAAVSGDAVVVEDIPLLVESGMAPLFPLWWHADVELR 
VRRLVEQRGMAEADARARIAAQASDQQRRAVADVWLDNSGSPEDLVRRARDVWNTRVQPFAHNL 
35 AQRQIARAPARLVPADPSWPDQARRIVNRLKIACGHKALRVDHIGSTAVSGFPDFLAKDVIDIQVTVE 
SLDVADELAEPLLAAGYPRLEHITQDTEKTOARSTVGRYDHTDSAALWHKRVHASADPGRPTNVHLR 
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VHGWPNQQFALLI^DWU^NPGAREDYLWKCDADRI^DGELARYVrAKEPWFLDAYQRAWEWA 
DAVHWRP 

>Rv1706c-TB.seq 1932695:1933876 MW:39779 SEQ ID NO:205 

I^LDWVNQGHVPPGSVACCLVGVTAVADGIAGHSLSNFGALPPEINSGRMYSGPGSGPLMAAAAA 

WDGLMELSSMTGYGAAISELTNMRWWSGPASDSMVMVLPFVGVVLSTTATLAEQAAMQARAAA 

AAFEAAFAMTVPPPAIAANRTLLMTLVDTNWFGQNTPAIATTESQYAEMWAQDAAAMYGYASAAAP 

ATVLTPFAPPPQTTNATGLVGHATAVAALRGQHSWAAAIPWSDIQKYWMMFLGALATAEGFIYDSG 

GLTLNALQFVGGMLWSTALAEAGAAEAAAGAGGAAGWSAWSQLGAGPVAASATLAAKIGPMSVPP 

GWSAPPATPQAQTVARSIPGIRSAAEAAETSVLLRGAPTPGRSRAAHMGRRYGRRLTVMADRPNVG 

>Rv1745c - similar to Q46822 ORF_0182 TB.seq 1971381:1971989 MW:22490 SEQ ID IMO:206 

MTRSYRPAPPIERWLLNDRGDATGVADKATVHTGDTPLHLAFSSYVFDLHDQLLITRRAATKRTWP 

AVVVTNSCCGHPLPGESLPGAIRRRl^AELGLTPDRVDLILPGFRYRAAMADGTVENEICPVYRVQVD 

QQPRPNSDEVDAIRWLSWEQFVRDVTAGVIAPVSPWCRSQLGYLTKLGPCPAQWPVADDCRLPKA 

AHGN 

>Rv1800 -TB.seq 2039451:2041415 MW:67068 SEQ ID NO:207 

MIJ'NFAVLPPEVNSARVFAGAGSAPMLAAAMWDDLASELHCAAMSFGSVTSGLWGVVWQGSASA 

AMVDAAASYIGVVLSTSAAHAEGAAGI^RAAVSVFEEAUVATVHPAMVAANRAQVASLVASNLFGQN 

APAIAALESLYECMWAQDAAAMAGYYVGASAVATQLASWLQRLQSIPGAASLDARLPSSAEAPMGV 

VRAVNSAIAANAAAAQTVGLVMGGSGTPIPSARYVELANALYMSGSVPGVIAQALFTPQGLYPVWIK 

NLTFDSSVAQGAVILESAIRQQIAAGNNVTVFGYSQSATISSLVMANLAASADPPSPDELSFTLIGNPN 

NPNGGVATRFPGISFPSLGVTATGATPHNLYPTKIYTIEYDGVADFPRYPLNFVSTLNAIAG7YYVHSN 

YFILTPEQIDAAVPLTNTVGPTMTQYYIIRTENLPLLEPLRSVPIVGNPLANLVQPNLKVIVNLGYGDPA 

YGYSTSPPNVATPFGLFPEVSPWIADALVAGTQQGIGDFAYDVSHLELPLPADGSTMPSTAPGSGT 

PVPPLSIDSLIDDLQVANRNLAMT1SKVAATSYATVLPTADIANAALTIVPSYNIHLFLEGIQQALKGDPM 

GLVNAVGYPLAADVALFTAAGGLQLLIIISAGRTIANDISAIVP 

>Rv1 844c gnd 6-phosphogluconate dehydrogenase (Gram -) TB.seq 2093732:20951 86 
MW:51548 SEQ ID NO:208 

MSSSESPAGIAQIGNn-GLAVMGSNIARNFARHGYTVAVHNRSVAKTDALLKEHSSDGKFVRSETIPEF 
LAALEKPRRVLIMVKAGEATDADAVINELADAMEPGDIIIDGGNALYTDTMRREKAMRERGLHFVGAG 
ISGGEEGALNGPSIMPGGPAESYQSLGPLLEEISAHVDGVPCCTHIGPDGSGHFVKMVHNGIEYSDM 
QLIGEAYQLMRDGLGLTAPAIADVFTEWNNGDLDSYLVEITAEVLRQTDAKTGKPLVDVIVDRAEQKG 
TGRVmKSALDLGVPVTGIAEAVFARALSGSVGQRSAASGLASGKLGEQPADPATFTEDVRQALYA 
SKIVAYAQGFNQIQAGSAEFGWDITPGDLATIWRGGCIIRAKFLNHIKEAFDASPNLASLIVAPYFRGA 
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VESAIDSWRRWSTMQLGIPTPGFSSALSYYDALRTARLPAALTQAQRDFFGAHTYGRIDEPGKFHT 
LWSSDRTEVPV 

>Rv1900c lipJ TB.seq 2146246:2147631 MW:49685 SEQ ID NO:209 

5 VAQAPHIHRTRYAKCGDMDIAYQVLGDGPTDLLVLPGPFVPIDSIDDEPSLYRFHRRLASFSRVIRLDH 
RGVGLSSRUAITTLGPKFWAQDAIA\^DAVGCEQATIFAPSFHAMNGLVLAADYPERVRSUWNGS 
ARPLWAPDYPVGAQVRRADPFLTVALEPDAVERGFDVLSIVAPTVAGDDVFRAWWDLAGNRAGPP 
SIARAVSKVIAEADVRDVLGHIEAPTLILHRVGSTYIPVGHGRYLAEHIAGSRLVELPGTDTLYWVGDT 
GPMLDEIEEFITGVRGGADAERMLATIMFTDIVGSTQHAAALGDDRWRDLLDNHDTIVCHEIQRFGGR 

10 EVNTAGDGFVATFTSPSAAIACADDIVDAVAALGIEVRIGIHAGEVEVRDASHGTDVAGVAVHIGARVC 
ALAGPSEVLVSSTVRDIVAGSRHRFAERGEQELKGVPGRWRLCVLMRDDATRTR 

>Rv1967 - TB.seq 2210599:2211624 MW;36516 SEQIDNO:210 

MRENLGGVWRLGVFU^VCLLTAFLLIAVFGEVRFGDGKTYYAEFANVSNLRTGKLVRIAGVEVGKVT 
1 5 RISINPDATVRVQFTADNSVTLTRGTRAVIRYDNLFGDRYI-ALEEGAGGLAVLRPGHTIPLARTQPALD 
LDALIGGFKPLFRALNPEQVNALSEQLLHAFAGQGPTIGSLLAQSAAWNTI^RDRLIGQVITNLNW 
LGSLGAHTDRLEM3AVTSLSALIHRLAQRKTDISNAVAYTNAAAGSVADLLSQARAPLAKWRETDRVA 
GIAAADHDYLDNLLNTLPDKYQALVRQGMYGDFFAFYLCDVVLKVNGKGGQPVYIKLAGQDSGRCA 
PK 

20 

>Rv1975 -TB.seq 2218050:2218712 MW:23650 SEQ ID NO:211 

MSRRASATCALSATTAVAIMMPMRADDKRLNDGWANVYTVQRQAGCTNDVTINPQLQLAAQWH 
TLDLLNNRHLNDDTGSDGSTPQDRAHAAGFRGKVAETVAINPAVAISGIELINQWYYNPAFFAIMSDC 
ANTQIGVWSENSPDRTWVAVYGQPDRPSAMPPRGAVTGPPSPVAAQENVPIDPSPDYDASDEIEY 
25 GINWLPWILRGVYPPPAMPPQ 

>Rv1981c nrdF ribonucleotide reductase small subunit TB.seq 2224221:2225186 MW.36591 
SEQ ID NO:212 

MTGKLVERVHAINWNRLLDAr^LQVWERLTGNFWLPEKIPLSNDI^SWQTLSSTEQQTTIRVFTGLT 
30 LLDTAQATVGAVAMIDDAVTPHEEAVLTNMAFMESVHAKSYSSIFSTLCSTKQIDDAFDWSEQNPYL 
QRKAQIIVDYYRGDDALKRKASSVMLESFLFYSGFYLPMYWSSRGKLTNTADLIRLIIRDEAVHGYYIG 
YKCQRGLADLTDAERADHREYTCELLHTLYANEIDYAHDLYDELGVVTDDVLPYMRYNANKALANLG 
YQPAFDRDTCQVNPAVRAALDPGAGENHDFFSGSGSSYVMGTHQPTTDTDWDF 

35 >Rv2092c helY helicase, Ski2 subfamily TB.seq 2349335:2352052 MW:99576 SEQ ID NO:21 3 
VTEI^ELDRFTAELPFSLDDFQQRACSALERGHGVLVCAFTGAGKTWGEFAVHIALAAGSKCFYTT 
PLKALSNQKHTDLTARYGRDQIGLLTGDLSVNGNAPWVMTTEVLRNMLYADSPALQGLSYWMDE 
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VHFUVDRMRGPVWEEVILQLPDDVRWSLSAWSNAEEFGGWIQTVRGDTTWVDEHRPVPLWQHV 

LVGKRMFDLFDYRIGEAEGQPQVNRELLRHIAHRREADRMADWQPRRRGSGRPGFYRPPGRPEVI 

AKLDAEGLLPAITFVFSRAGCDAAVTQCLRSPLRLTSEEERARIAEVIDHRCGDLADSDLAVLGYYEW 

REGLLRGIJkAHHAGMLPAFRHTVEELFTAGLVKAVFATETLALGINMPARTWLERLVKFNGEQHMP 

LTPGEYTQLTGRAGRRGIDVEGHAWIWHPEIEPSEVAGLASTRTFPLRSSFAPSYNMTINLVHRMGP 

QQAHRLLEQSFAQYQADRSWGLVRGIERGNRILGEIAAELGGSDAPILEYARLRARVSELERAQARA 

SRLQRRQAATDAlAALRRGDimTHGRRGGLAWLESARDRDDPRPLVLTEHRWAGRISSADYSGTT 

PVGSMTLPKRVEHRQPRVRRDLASALRSAAAGLVIPAARRVSEAGGFHDPELESSREQLRRHPVHT 

SPGLEDQIRQAERYLRIERDNAQLERKVAAATNSLARTFDRFVGLLTEREFIDGPATDPWTDDGRLL 

ARIYSESDLLVAECLRTGAWEGLKPAELAGWSAWYETRGGDGQGAPFGADVPTPRLRQALTQTS 

RLSTTLRADEQAHRITPSREPDCX3FVRVIYRWSRTGDLAAALAAADVNGSGSPLLAGDFVRWCRQV 

LDLLDQVRNAAPNPELRATAKRAIGDIRRGWAVDAG 

>Rv2101 helZ helicase, Snf2/Rad54 family TB.seq 2360238:2363276 MW:1 1 1632 
SEQ ID NO:214 

ML\a.HGFWSNSGGMRLWAEDSDLLVKSPSQALRSARPHPFAAPADLIAGIHPGKPATAVLLLPSLRS 

APLDSPELIRLAPRPAARTDPMLLAVVWPWDLDPTAALAAFDQPAPDVRYGASVDYLAELAVFAREL 

VERGRVLPQLRRDTHGAMCWRPVL(MRDWAMTSLVSAMPPVCRAEVGGHDPHELATSALDAMV 

DMVRAALSPMDLLPPRRGRSKRHRAVEAWLTALTCPDGRFDAEPDELDALAEALRPWDDVGIGTV 

GPARATFRLSEVETENEETPAGSLWRLEFLLQSTQDPSLLVPAEQAWNDDGSLRRWLDRPQELLLT 

ELGRASRIFPELVPALRTACPSGLELDADGAYRFLSGTAAVLDEAGFGVLLPSWWDRRRKLGLVLSA 

YTPVDGWGKASKFGREQLVEFRWELAVGDDPLSEEEIAALTETKSPLIRLRGQVWALDTEQMRRGL 

EFLERKPTGRKTTAEILALAASHPDDVDTPLEVTAVRADGWLGDLLAGAAAASLQPLDPPDGFTATLR 

PYQQRGLAWUXFLSSLGLGSCLADDMGLGKTVQLLALETLESVQRHQDRGVGPTaLCPMSLVGN 

WPQEAARFAPNLRWAHHGGARLHGEALRDHLERTDLWSTYTTATRDIDELAEYEWNRWLDEAQ 

AVKNSLSRAAKAVRRLRAAHRVALTGTPMENRLAELWSIMDFLNPGLLGSSERFRTRYAIPIERHGHT 

EPAERLRASTRPYILRRLKTDPAIIDDLPEKIEIKQYCQLTTEQASLYQAWADMMEKIENTEGIERRGN 

VLA^MAKLKQVCNHPAQLLHDRSPVGRRSGKVIRLEEILEEILAEGDRVLCFTQFTEFAELLVPHLAAR 

FGRAARDIAYLHGGTPRKRRDEMVARFQSGDGPPIFLLSLKAGGTGLNLTAANHWHLDRVVWNPAV 

EN(^TDRAFRIGQRRTVQVRKFICTGTLEEKIDEMIEEKKALADL\A/TDGEGWLTELSTRDLREVFAL 

SEGAVGE 

>Rv21 1 0c prcB proteasome {beta]-type subunit 2 TB.seq 2369727:2370599 MW:30274 
SEQ ID NO:215 

VTWPLPDRLSINSLSGTPAVDLSSFTDFLRRQAPELLPASISGGAPLAGGDAQLPHGTTIVALKYPGG 
WMAGDRRSTQGNMISGRDVRKVYITDDYTATGIAGTAAVAVEFARLYAVELEHYEKLEGVPLTFAG 
KINRLAIMVRGNLAAAMQGLLALPLLAGYDIHASDPQSAGRIVSFDAAGGWNIEEEGYQAVGSGSLFA 
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KSSMKKLYSQVTDGDSGLRVAVEALYDAADDDSATGGPDLVRGIFPTAVIIDAD6AVDVPESRIAELA 
RAIIESRSGADTFGSDGGEK 

>Rv2118c - = B2126_C1_165 (83.6%) TB.seq 2377471:2378310 MW:30091 SEQ ID NO:216 
5 VSATGPFSIGERVQLTDAKGRRYTMSLTPGAEFHTHRGSIAHDAVIGLEQGSWKSSNGALFLVLRPL 
LVDYVMSMPRGPQVIYPKDAAQIVHEGDIFPGARVLEAGAGSGALTLSLLRAVGPAGQVISYEQRAD 
HAEHARRNVSGCYGQPPDNWRLWSDLADSELPDGSVDRAVLDMLAPWEVLDAVSRLLVAGGVLM 
VWATVTQLSRIVEALRAKQCWTEPRAWETLQRGWNWGLAVRPQHSMRGHTAFLVATRRLAPGA 
VAPAPLGRKREGRDG 

10 

>Rv2144c - TB.seq 2404166:2404519 MW:12028 SEQ ID NO:217 

MLIIALVLAUGLLALVFANA/TSNQLVAVVVCIGASVLGVALLIVDALRERQQGGADEADGAGETGVAEE 
ADVDYPEEAPEESQAVDAGVIGSEEPSEEASEATEESAVSADRSDDSAK 

15 >Rv2146c - TB.seq 2405667:2405954 MW:10805 SEQ ID NO:218 

LWFFQILGFALFIFWLLLIAR\AA/EFIRSFSRDWRPTG\m^LEIIMSITDPPWVA.RRLIPQLTIGAW 
DLSI MVLLLVAFIGMQLAFGAAA 

>Rv2147c- TB.seq 2406119:2406841 MW:27630 SEQIDNO:219 
20 VNSHCSHTFITDNRSPRARRGHAMSTLHKVKAYFGMAPMEDYDDEYYDDRAPSRGYARPRFDDDY 
GRYDGRDYDDARSDSRGDLRGEPADYPPPGYRGGYADEPRFRPREFDRAEMTRPRFGSWLRNST 
RGAl^MDPRRMAMMFEDGHPLSKITTLRPKDYSEARTIGERFRDGSPVIMDLVSMDNADAKRLVDF 
AAGLAFALRGSFDKVATKVFULSPADVDVSPEERRRIAETGFYAYQ 

25 >Rv214Bc - TB.seq 2406841:2407614 MW:27694 SEQ ID NO:220 

MAADLSAYPDRESELTHALAAMRSRLAAAAEAAGRNVGEIELLPITKFFPATDVAILFRLGCRSVGES 
REQEASAKMAELNRLLAAAELGHSGGVHWHMVGRIQRr4KAGSLARWAHTAHSVDSSRLVTALDRA 
WAALAEHRRGERLRVWQVSLDGDGSRGGVDSTTPGAVDRICAQVQESEGLELVGLMGIPPLDWD 
PDEAFDRLQSEHNRVRAMFPHAIGLSAGMSNDLEVAVKHGSTCVRVGTALLGPRRLRSP 

30 

>Rv2150c ftsZ TB.seq 2408386:2409522 MW:38757 SEQ ID NO:221 

MTPPHNYLAVIKWGIGGGGVNAVNRMIEQGLKGVEFIAINTDAQALLMSDADVKLDVGRDSTRGLG 
AGADPEVGRKAAEDAKDEIEELLRGADMVFVTAGEGGGTGTGGAPWASIARKLGALTVGWTRPF 
SFEGKRRSNQAENGIAALRESCDTLIVIPNDRLLQMGDAAVSLMDAFRSADEVLLNGVQGITDLITTP 
35 GLINVDFADVKGIMSGAGTALMGIGSARGEGRSLKAAEIAINSPLLEASMEGAQGVLMSIAGGSDLGL 
FEINEAASLVQDAAHPDANIIFGTVIDDSLGDEVRVTVIAAGFDVSGPGRKPVMGETGGAHRIESAKA 
GKLTSTLFEPVDAVSVPLHTNGATLSIGGDDDDVDVPPFMRR 
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>Rv2152c murC TB.seq 2410639:2412120 MW:51146 SEQ ID NO:222 

VSTEQLPPDLRRVHMVGIGGAGMSGIARILLDRGGLVSGSDAKESRGVHALRARGALIRIGHDASSL 
DLLPGGATAWTTHAA1PKTNPELVEARRRGIPWLRPAVLAKLMAGR 

5 LQHCGLDPSFAVGGELGEAGTNAHHGSGDCFVAEADESDGSLLQYTPHVAVITNIESDHLDFYGSVE 
AYVAVFDSFVERIVPGGALWCTDDPGGAALAQRATELGIRVLRYGSVPGETMAATLVSWQQQGVG 
AVAHIRLASELATAQGPRVMRLSVPGRHMALNALGALLAAVQIGAPADEVLDGLAGFEGVRRRFELV 
GTCGVGKASVRVFDDYAHHPTEISATLAAARMVLEQGDGGRCMWFQPHLYSRTKAFAAEFGRALN 
MDEVFVLDWGAREQPLAGVSGASVAEHVTVPMRYVPDFSAVAQQVAAMSPGDVIVTMGAGDVT 

10 LLGPEILTALRVRANRSAPGRPGVLG 

>Rv2153c murG TB.seq 2412120:2413349 MW:41829 SEQ ID NO:223 

VKDTVSQPAGGRGATAPRPADAASPSCGSSPSADSVSWLAGGGTAGHVEPAMAVADALVALDPR 
VRITALGTLRGLETRLVPQRGYHLELITAVPMPRKPGGDLARLPSRVWRAVREARDVLDDVDADVW 
15 GFGGYVALPAYLAARGLPLPPRRRRRIPWIHEANARAGLANRVGAHTADRVLSAVPDSGLRRAEW 
GWVRASIAALDRAVLRAEARAHFGFPDDARVLLVFGGSQGAVSLNRAVSGAAADLAAAGVC\A,HA 
HGPQNVLELRRRA(MDPPWAVPYU3RMEI^YAAADLVICRAGAMTVAEVSAVGLPAIYVPLPIGNG 
EQRLNALPWNAGGGMWADAALTPELVARQVAGLLTDPARLAAMTAAAARVGHRDAAGQVARAAL 
AVATGAGARTTT 

20 

>Rv2154c ftsW TB.seq 2413349:2414920 MW:56306 SEQ ID NO:224 

VLTRLLRRGTSDTDGSQTRGAEPVEGQRTGPEEASNPGSARPRTRFGAWLGRPMTSFHLIIAVAAUL 
TTLGLIMVLSASAVRSYDDDGSAVWIFGKQVLWTLVGLIGGYVCLRMSVRFMRRIAFSGFAITIVMLVL 
VLVPGIGKEANGSRGWFWAGFSMQPSELAKMAFAIWGAHLLAARRMERASLREMLIPLVPAAWAL 
25 ALIVAQPDLGQTVSMGIILLGLLVVYAGLPLRVFLSSIAAWVSAAILAVSAGYRSDRVRSWLNPENDP 
QDSGYQARQAKFALAQGGIF6DGLGQGVAKWNYLPNAHNDFIFAIIGEELGLVGALGLLGLFGLFAY 
TGMRIASRSADPFLRLLTATTTLVVVUGQAFINIGYVIGLLPVTGLQLPLISAGGTSTAATLSLIGIIANAAR 
HEPEAVAALRAGRDDKVNRLLRLPLPEPYLPPRLEAFRDRKRANPQPAQTQPARKTPRTAPGQPAR 
QMGLPPRPGSPRTADPPVRRSVHHGAGQRYAGQRRTRRVRALEGQRYG 

30 

>Rv2155c murD TB.seq 2414935:2416392 MW:49314 SEQ ID NO:225 

VLDPLGPGAPVLVAGGRVTGQAVAAVLTRFGATPTVCDDDPVMLRPHAERGLPTVSSSDAVQQITG 
YALWASPGFSPATPLUWKAMGVPIWGDVEUVWRLDAAGCYGPPRSWLWTGTNGKTTTTSMLH 
AMLIAGGRRAVLCGNIGSAVLDVLDEPAELLAVELSSFQLHWAPSLRPEAGAVLNIAEDHLDWHATM 
35 AEYTAAKARVLTGGVAVAGLDDSRAAALLDGSPAQVRVGFRLGEPAARELGVRDAHLVDRAFSDDL 
TLLPVASIPVPGPVGVLDALAAAALARSVGVPAGAIADAVTSFRVGRHRAEWAVADGITYVDDSKAT 
NPHMRASVLAYPRWWIAGGLLKGASLHAEVAAMASRLVGAVLIGRDRAAVAEALSRHAPDVPWQ 
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WAGEDTGMPATVEVPVACVLDVAKDDKAGETVGAAVMTAAVAAARRMAQPGDTVLLAPAGASFD 
QFTG YADRGEAFATAVRAVI R 

>Rv21 56c murX TB.seq 241 6397:241 7473 MW:37714 SEQ ID NO:226 

5 MRQILIAVAVAVTVSILLTPVLIRLFTKQGFGHQIREDGPPSHHTKRGTPSMGGVAILAGIWAGYLGAH 
LAGLAFDGEGIGASGLLVLGLATALGGVGFIDDLIKIRRSRNLGLNKTAKTVGQITSAVLFGVLVLQFRN 
AAGLTPGSADLSWREIATVTLAPVLFVLPCWIVSAWSNAVNFTDGLDGLAAGTMAMVTAAYVLITF 
WQYRNACVTAPGLGCYNVRDPLDLALIAAATAGACIGFLWWNAAPAKIFMGDTGSLALGGVIAGLSV 
TSRTEILA\A^GALFVAEITS\A^QILTFRTTGRRMFRMAPFHHHFELVGWAETTVIIRFWLLTAITCGL 

10 GVALFYGEWLAAVGA 

>Rv2157c murF TB.seq 2417473:2419002 MW:51634 SEQ ID NO:227 

MIELTVAQIAEIVGGAVADISPQDAAHRRWGWEFDSRAIGPGGLFI^PGARADGHDHAASAVAAG 
AAWLAARPVGVPAIWPPVAAPNVLAGVLEHDNDGSGAAVU\AI^KLATAVMQLVAGGLTIIGITGS 

15 SGKTSTKDLMAAVAAPLGEWAPPGSFNNELGHPWTVLRATRRTDYLILEMAARHHGNIAALAEIAPP 
SIGWLNVGTAHLGEFGSREVIAQTKAELPQAVPHSGAWLNADDPAVAAMAKLTAARWRVSRDNT 
GDVWAGPVSLDELARPRFTLHAHDAQAEVRLGVCGDHQVTNALCAAAVALECGASVEQVAAALTAA 
PPVSRHRMQVTTOGDGVTVIDDAYNANPDSMRAGLQAI^WIAHQPEATRRSWAVLGEMAELGEDAI 
AEHDRIGRLAVRLDVSRLNA/VGTGRSISAMHHGAVLEGAWGSGEATADHGADRTAVNVADGDAALA 

20 LLRAELRPGDWLVKASNAAGLGAVADALVADDTCGSVRP 

>Rv2158c murE TB.seq 2419002:2420606 MW:55310 SEQ ID NO:228 

VSSLARGISRRRTEVATQVEAAPTGLRPNAWGVRLAALADQVGAALAEGPAQRAVTEDRTVTGVTL 
RAQDVSPGDLFAALTGSTTHGARHVGDAIARGAVAVLTDPAGVAEIAGRAAVPVLVHPAPRGVLGGL 

25 AATWGHPSERLTVIGITGTSGKTTTTYLVEAGLRMGRVAGLIGTIGIRVGGADLPSALTTPEAPTLQA 
MLAAMVERGVDTWMEVSSHALAL6RVDGTRFAVGAFTNLSRDHLDFHPSMADYFEAKASLFDPDS 
ALRARTAWCIDDDAGRAMAARAADAITVSAADRPAHWRATDVAPTDAGGQQFTAIDPAGVGHHIGI 
RLPGRYNVANCLVALAILDTVGVSPEQAVPGLREIRVPGRLEQIDRGQGFLALVDYAHKPEALRSVLT 
Tl^HPDRRUkWFGAGGDRDPGKPiAPMGRIAAQLADLWVTDDNPRDEDPTAIRREILAGAAEVGGD 

30 AQWEIADRRDAIRHAVAWARPGDWLIAGKGHETGQRGGGRVRPFDDRVELAAALEALERRA 

>Rv2169c - TB.seq 2420632:2421663 MW:36377 SEQ ID NO:229 

MKFVNHIEPVAPRRAGGAVAEVYAEARREFGRLPEPLAMLSPDEGLLTAGWATLRETLLVGQVPRG 
RKEAVAAAVAASLRCPWCVDAHTTMLYAAGQTDTAAAILAGTAPAAGDPNAPYVAWAAGTGTPAGP 
35 PAPFGPDVAAEYLGTAVQFHFIARLVLVLLDETFLPGGPRAQQLMRRAGGLVFARKVRAEHRPGRST 
RRLEPRTLPDDLAWATPSEPIATAFAALSHHLDTAPHLPPPTRQWRRWGSWHGEPMPMSSRWTN 
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EHTAELPADLHAPTRLALLTGLAPHQVTDDDVAAARSLLDTDAALVGALAWAAFTAARRIGTWIGAAA 
EGQVSRQNPTG 

>Rv2163c pbpB TB.seq 2425049:2427085 MW:72506 SEQ ID NO:230 

5 VSRMPRRASQSQSTRPARGLRRPPGAQEVGQRKRPGKTQKARQAQEATKSRPATRSDVAPAGR 
STRARRTRQWDVGTRGASFVFRHRTGNAVILVLMLVAATQLFFLQVSHAAGLRAQAAGQLKVTDV 
QPAARGSIVDRNNDRLAFTIEARALTFQPKRIRRQLEEARKKTSAAPDPQQRLRDIAQEVAGKLNNKP 
DAMVLKKLQSDETFWLAP^VDPAVASAICAKYPEVGAERQDLRQYPGGSLAANWGGIDWDGHG 
LLGLEDSLDAVLAGTDGSVTYDRGSDGWIPGSYRNRHKAVHGSTWLTLDNDIQFYVQQQVQQAK 

10 NLSGAHNVSAWLDAKTGEVLAMANDNTFDPSQDIGRQGDKQLGNPAVSSPFEPGSVNKIVAASAVI 
EHGLSSPDEVLQVPGSIQMGGVTVHDAWEHGVMPYTTTGVFGKSSNVGTLMLSQRVGPERYYDML 
RKFGLGQRTGVGLPGESAGLVPPIDQWSGSTFANLPIGQGLSMTLLQMTGMYQAIANDGVRVPPRII 
KATVAPDGSRTEEPRPDDIRWSAQTAQTVRQMLRAWQRDPMGYQQGTGPTAGVPGYQMAGKT 
GTAQQINPGCGCYFDDVYWITFAGIATADNPRWIGIMLDNPARNSDGAPGHSAAPLFHNIAGWLMQ 

15 RENVPLSPDPGPPLVLQAT 

>Rv2165c - TB.seq 2428236:2429423 MW:42498 SEQ ID NO:231 

VQTRAPWSLPEATLAYFPNARFVSSDRDLGAGAAPGIAASRSTACQTWGGITVADPGSGPTGFGHV 
PVLAQRCFELLTPALTRYYPDGSQAVLLDATIGAGGHAERFLEGLPGLRLIGLDRDPTALDVARSRLV 
20 RFADRLTLVHTRYDCLGAALAESGYAAVGSVDGILFDLGVSSMQLDRAERGFAYATDAPLDMRMDP 
TTPLTAADIVNTYDEAALADILRRYGEERFARRIAAGIVRRRAKTPFTSTAELVALLYQAIPAPARRVGG 
HPAKRTFQALRIAVNDELESLRTAVPAALDALAIGGRIAVLAYQSLEDRIVKRVFAEAVASATPAGLPV 
ELPGHEPRFRSLTHGAERASVAEIERNPRSTPVRLRALQRVEHRAQSQQWATEKGDS 

25 >Rv2166c - TB.seq 2429428:2429856 MW:15912 SEQ ID NO:232 

MFLGTYTPKLDDKGRLTLPAKFRDAUVGGLMWKSQDHSLAVYPRAAFEQLARRASKAPRSNPEAR 

AFLRNLAAGTDEQHPDSQGRITLSADHRRYASLSKDCWIGAVDYLEIWDAQAWQNYQQIHEENFSA 

ASDEALGDIF 

>Rv2197c - TB.seq 2461505:2462146 MW.22481 SEQ ID NO:233 
30 ^SRYSAYRRGPDVISPDVIDRILVGACAAVWLVFTGVSVAAAVALMDLGRGFHEMAGNPHTTWVL 
YAVIWSALVIVGAIPVLLRARRMAEAEPATRPTGASVRGGRSIGSGHPAKRAVAESAPVQHADAFEV 
AAEWSSEAVDRIWLRGTWLTSAIGIALIAVAAA7YLMAVGHDGPSWISYGLAGVVTAGMPVIEWLYA 
RQLRRWAPQSS 

>Rv2198c - TB.seq 2462149:2463045 MW:30955 SEQ ID NO:234 
35 MSGPNPPGREPDEPESEPVSDTGDERASGNHLPPVAGGGDKLPSDQTGETDAYSRAYSAPESEHV 
TGGPWPADLRLYDYDDYEESSDLDDELAAPRWPWWGVAAIIAAVALWSVSLLVTRPMTSKLATG 
DTTSSAPPVQDEITTTKPAPPPPPPAPPPnEIPTATETQTVmPPPPPPPATTTAPPPATTTTAAAP 
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PPTTTTPTGPRQVTYSVTGTKAPGDIISVTYVDA 
FRVSKLNCSITTSDGTVLSSNSNDGPQTSC 

>Rv2199c - TB.seq 2463234:2463650 MW;14866 SEQ ID NO:235 
5 MHIEARLFEFVAAFRATTAVLYGVLTSMFATGGVEWAGTTALALTGGMALIVATFFRFVARRLDSRPE 
DYEGAEISDGAGELGFFSPHSWWPIMVALSGSVAAVGIALWLPWLIAAGVAFILASAAGLVFEYYVGP 
EKH 

>Rv2200c ctaC TB.seq 2463661:2464749 MW:40449 SEQ ID NO:236 

10 VTPRGPGRLQRLSQCRPQRGSGGPARGLRQLALAAMLGALAVTVSGCSWSEALGIGWPEGITPEA 
HLNRELWIGAVIASLAVGVIVWGUFWSAVFHRKKNTDTELPRQFGYNMPLELVLTVIPFLIISVLFYFT 
WVQEKMLQIAKDPEWIDITSFQWNWKFGYQRVNFKDGTLTYDGADPERKRAMVSKPEGKDKYGE 
ELVGPVRGLNTEDRTYLNFDKVETUGTSTEIPVLVLPSGKRIEFQMASADVIHAFWVPEFLFKRDVMP 
NPVANNSVNVFQIEEITKTGAFVGHCAEMCGTYHSMMNFEVRWTPNDFKAYLQQRIDGKTNAEALR 

15 Al NQPPLAVTTHPFDTRRGELAPQPVG 

>Rv2427c proA g-glutamyl phosphate reductase TB.seq 2724231 :2725475 MW:43746 
SEQ ID NQ:237 

MWPAPSQLDLRQEVHDAARP^RVAARRLASLPTTVKDRALHAAADELLAHRDQILAANAEDLNAAR 
20 EADTPAAMLDRLSLNPQRVDGIAAGLRQVAGLRDPVGEVLRGYTLPNGLQLRQQRVPLGWGMIYE 
GRPNVTVDAFGLTLKSGNAALLRGSSSAAKSNEALVAVLRTALVGLELPADAVQLLSAADRATVTHLI 
QARGLVDWIPRGGAGLIEAWRDAQVPTIETGVGNCHVYVHQAADLDVAERILLNSKTRRPSVCNA 
AETLLVDAAIAETALPRLLAALQHAGVTVHLDPDEADLRREYLSLDIAVAWDGVDAAIAHINEYGTGH 
TEAIVTTNLDAAQRFTEQIDAAAVMVNASTAFTDGEQFGFGAEIGISTQKLHARGPMGLPELTSTKWI 
25 AWGAGHTRPA 

>Rv2438c - similar to YHN4.YEAST P38795 TB.seq 2734793:2737006 MW:80492 
SEQ ID NO:238 

MGLLGGQSGPRVGSGPVGSIPTPVNAAICQQRGGFHGVERGYSAGDSGVLTSLGDNERTMNFYSA 
30 YQHGFVRVAACTHHTTIGDPAANAASVLDMARACHDDGAALAVFPELTLSGYSIEDVLLQDSLLDAV 
EDALLDLVTESADLLPVLWGAPLRHRHRIYNTAWIHRGAVLGWPKSYLPTYREFYERRQMAPGD 
GERGTIRIGGADVAFGTDLLFAASDLPGFVLHVEICEDMFVPMPPSAEAALAGATVLANLSGSPITIGR 
AEDRRLLARSASARCLMYWAMGEGESTTDLAWDGQTMIWENGALLAESERFPKGVRRSVADVD 
TELLRSERLRMGTFDDNRRHHRELTESFRRIDFALDPPAGDIGLLREVERFPFVPADPQRLQQDCYE 
35 AYNIQVSGLEQRLRALDYPKWIGVSGGLDSTHALIVATHAMDREGRPRSDILAFALPGFATGEHTKN 
NAIKLARALGVTFSEIDIGDTARLMLHTIGHPYSVGEKVYDVTFENVQAGLRTDYLFRIANQRGGIVLG 
TGDLSELALGWSTYGVGDQMSHYNVNAGVPKTLIQHLIRWVISAGEFGEKVGEVLQSVLDTEITPELI 
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PTGEEELQSSEAKVGPFALQDFSLFQVLRYGFRPSKIAFUWHAWNDAERGNWPPGFPKSERPSYS 
LAEIRHWLQIFVQRFYSFSQFKRSALPNGPKVSHGGALSPRGDWRAPSDMSARIWLDQIDREVPKG 

>Rv2439c proB glutamate 5-kinase TB.seq 27371 18:2738245 MW:38789 SEQ ID NO:239 

5 MRSPHRDAIRTARGLWKVGTTALTTPS6M 

GLSRRPKDUkTKQAAASVGQVALVNSWSAAFARYGRTVGQVLLTAHDISMRVQHTNAQRTLDRLRA 
LHAVAIVNENDTVATNEIRFGDNDRLSALVAHLVGADALVLLSDIDGLYDCDPRKTADATFIPEVSGPA 
DLDGWAGRSSHLGTGGMASKVAAALI^ADAGVPVLUVPAADAATALADASVGTVFAARPARLSAR 
RFWVRYMEATGALTLDAGAVRAWRQRRSLU^AGITAVSGRFCGGDVVELRAPDAAMVARGWAY 

10 DASELATMVGRSTSELPGELRRPWHADDLVAVSAKQAKQV 

>Rv2440c obg Obg GTP-binding protein TB.seq 2738248:2739684 MW:50430 
SEQ ID NO:240 

VPRFVDRWIHTRAGSGGNGCASVHREKFKPLGGPDGGNGGRGGSIVFWDPQVHTLLDFHFRPHL 
15 TMSGKHGMGNNRDGMGADLEVKVPEGTVVLDENGRLLADLVGAGTRFEAMGGRGGLGNAA^ 
SRVRKAPGFALLGEKGQSRDLTLELKTVADVGLVGFPSAGKSSLVSAISAAKPKIADYPFTTLVPNLG 
WSAGEHAFTVADVPGLIPGASRGRGLGLDFLRHIERCAVLVHWDCATAEPGRDPISDIDALETELA 
CYTPTLQGDAALGDLAARPRAWLNKIDVPEARELAEFVRDDIAQRGWPVFCVSTA7RENLQPLIFGL 
SQMISDYNAARPVAVPRRPVIRPIPVDDSGFTVEPDGHGGFWSGARPERWIDQTNFDNDEAVGYL 
20 ADRLARLGVEEELLRLGARSGCAVTIGEMTFDWEPQTPAGEPVAMSGRGTDPRLDSNKRVGAAER 
KAARSRRREHGDG 

>Rv2441c rpmA 50S ribosomal protein L27 TB.seq 2739773:2740030 MW;8969 
SEQ ID NO:241 

25 MAHKKGASSSRNGRDSAAQRLGVKRYGGQVVKAGEILVRQRGTKFHPGVNVGRGGDDTLFAKTAG 
AVEFGIKRGRKTVSIVGSTTA 

>Rv2442c rplU SOS ribosomal protein L21 TB.seq 2740048:2740359 MW: 11 152 
SEQ ID NO:242 

30 MMATYAIVKTGGKQYKVAVGDNA/KVEKLESEQGEKVSLPVALWDGATVTTD 
HTKGPKIRIHKFKNKTGYHKRQGHRQQLTVLKVTGIA 

>Rv2448c valS valyWRNA synthase TB.seq 2747596:2750223 MW:97822 SEQ ID NO:243 
MLPKSWDPAAMESAIYQKWLDAGYFTADPTSTKPAYSIVLPPPNVT^ 
35 KRMQGYEVLWQPGTDHAGIATQSWEQQUWDGKTKEDLGRELFV^ 

RLGDGVDWSRDRFTMDEGLSRAVRTIFKRLYDAGLIYRAERLVNWSPVLQTAISDLEVNYRDVEGEL 
VSFRYGSLDDSQPHIWATTRVETMLGDTAIAVHPDDERYRHLVGTSUVHPFVDRELAIVADEHVDPE 
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FGTGAVKVTPAHDPNDFEIGVRHQLPMPSILDTKGRIVDTGTRFD6MDRFEARVAVRQALAAQGRV 

VEEKRPYLHSVGHSERSGEPIEPRLSLQWVWRVESLAKAAGDAVRNGD7VIHPASMEPRWFSWVD 

DMHDWCISRQLWWGHRjPIWYGPDGEQVCVGPDETPPQGWEQDPDVLDTWFSSALWPFSTLGW 

PDKTAELEKFYPTSVLVTGYDILFFWVARMMMFGTFVGDDAAITLDGRRGPQVPFTDVFLHGLIRDE 

SGRKMSKSKGNVIDPLDVVVEMFGADALRFTLARGASPGGDLAVSEDAVRASRNFGTKLFNATRYAL 

LNGAAPAPLPSPNELTDADRWILGRLEEVRAEVDSAFDGYEFSRACESLYHFAWDEFCDWYLELAK 

TQLAQGLTHTTAVLAAGLDTLLRLLHPVIPFLTEALWLALTGRESLVSADWPEPSGISVDLVAAQRIND 

MQKLVTEVRRFRSDQGLADRQKVPARMHGVRDSDLSNQVAAVTSLAWLTEPGPDFEPSVSLEVRL 

GPEMNRTVWELDTSGTIDVAAERRRLEKELAGAQKELASTAAKLANADFLAKAPDAVIAKIRDRQRV 

AQQETERITTRLAALQ 

>Rv2482c plsB2 TB.seq 2786915:2789281 MW:88284 SEQ ID NO:244 

VTKPAADASAV/LTAEDTLVLASTATPVEMELIMGWLGQQRARHPDSKFDILKLPPRNAPPAALTALVE 

QLEPGFASSPQSGEDRSIVPVRVIWLPPADRSRAGKVAALLPGRDPYHPSQRQQRRILRTDPRRAR 

WAGESAKVSELRQQWRDT7VAEHKRDFAQFVSRRALLALARAEYRILGPQYKSPRLVKPEMLASA 

RFRAGLDRIPGATVEDAGKMLDELSTGWSQVSVDLVSVLGRLASRGFDPEFDYDEYQVAAMRAALE 

AHPAVLLFSHRSYIDGVWPVAMQDNRLPPVHMFGGINLSFGLMGPLMRRSGMIFIRRNIGNDPLYK 

YVLKEYVGYWEKRFNLSWSIEGTRSRTGKMLPPKLGLMSYVADAYLDGRSDDILLQGVSICFDQLH 

EITEYAAYARGAEKTPEGLRWLYNFIKAQGERNFGKIYVRFPEAVSMRQYLGAPHGELTQDPAAKRL 

ALQKMSFEVAWRILQATPVTATGLVSALLLTTRGTALTLDQLHHTLQDSLDYLERKQSPVSTSALRLR 

SREGVRAAADALSNGHPVTRVDSGREPNAWYIAPDDEHAAAFYRNSVIHAFLETSIVELALAHAKHAE 

GDRVAAFWAQAMRLRDLLKFDFYFADSTAFRANIAQEMAWHQDWEDHLGVGGNEIDAMLYAKRPL 

MSDAMLRVFFEAYEIVADVLRDAPPDIGPEELTELALGLGRQFVAQGRVRSSEPVSTLLFATARQVAV 

DQELIAPAADLAERRVAFRRELRNILRDFDYVEQIARNQFVACEFKARQGRDRI 

>Rv2509 - putative oxidoreductase TB.seq 2824676:2825479 MW:28014 SEQ ID NO:245 

MPIPAPSPDARAWTGASQNIGAALATEUVARGHHLIVTARREDVLTELAARLADIO^RVTVDVRPADL 

ADPQERSKLADELMRPISILCANAGTATFGPIASLDLAGEKTQVQLNAVAVHDLTLAVLPGMIERKAG 

GILISGSAAGNSPIPYNAWAATKAFVKrrFSESLRGELRGSGVHVTVLAPGPVRTELPDASEASLVEKL 

VPDFLWISTEHTARVSLNALERNKMRWPGLTSKAMSVASQYAPRAIVAPIVGAFYKRLGGS 

>Rv2524c fas fatty acid synthase TB.seq 2840124:2849330 MW:326226 SEQ ID NO:246 

VT1HEHDRVSADRGGDSPHTTHALVDRLMAGEPYAVAFGGQGSAWLETLEELVSATGIETELATLVG 

EAELLLDPWDELIWRPIGFEPLQVVVRALAAEDPVPSDKHLTSAAVSVPGVLLTQIAATRALARQGM 

DLVATPPVAMAGHSQGVLAVEALKAGGARDVELFALAQLIGAAGTLVARRRGISVLGDRPPMVSVTN 

ADPERIGRLLDEFAQDVRTVLPPVLSIRNGRRAWITGTPEQLSRFELYCRQISEKEEADRKNKVRGG 

DVFSPVFEPVQVEVGFHTPRLSDGIDIVAGWAEKAGLDVAtARELADAILIRKVDWVDEITRVHAAGA 
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RWILDLGPGDILTRLTAPVIRGLGIGIVPAATRGGQRNLFTVGATPEVARAWSSYAPTWRLPDGRVK 

LSTKFTRLTGRSPILLAGMTPTrVDAKIVAAAANAGHWAELAGGGQVTEEIFGNRIEQMAGLLEPGRT 

YQFNALFLDPYLWKLQVGGKRLVQKARQSGAAIDGWISAGIPDLDEAVELIOELGDIGISHWFKPGT 

lEQIRSVIRIATEVPTKPVIMHVEGGRAGGHHSWEDLDDLLLATYSELRSRANITVCVGGGIGTPRRAA 

EYLSGRWAQAYGFPLMPIDGILVGTAAMATKESTTSPSVKRMLVDTQGTDQWISAGKAQGGMASSR 

SQLGADIHEIDNSASRCGRLLDEVAGDAEAVAERRDEIIAAMAKTAKPYFGDVADMTYLQWLRRYVE 

LAIGEGNSTADTASVGSPWLADTWRDRFEQMLQRAEARLHPQDFGPIQTLFTDAGLLDNPQQAIAAL 

LARYPDAEWQLHPADVPFFVTLCKTLGKPVNFVPVIDQDVRRWWRSDSLWQAHDARYDADAVCIIP 

GTASVAGITRMDEPVGELLDRFEQAAIDEVLGAGVEPKDVASRRLGRADVAGPLAWLDAPDVRWA 

GRTVTNPVHRIADPAEWQVHDGPENPRATHSSTGARLQTHGDDVALSVPVSGTWVDIRFTLPANTV 

DGGTPVIATEDATSAMRTVLAIAAGVDSPEFLPAVANGTATLTVDWHPERVADHTGVTATFGEPLAP 

SLTNVPDALVGPCWPAVFAAIGSAVTDTGEPWEGLLSLVHLDHAARWGQLP7VPAQLTVTATAAN 

ATDTDMGRWPVSWVTGADGAVIATLEERFAILGRTGSAELADPARAGGAVSANATDTPRRRRRDV 

TITAPVDMRPFAWSGDHNPIHTDRAAALU^GLESPIVHGMWLSAAAQHAVTATDGGARPPARLVG 

mARFLGMVRPGDEVDFR\^RVGIDQGAEIVDVAARVGSDLVMSASARLAAPK7VYAFPGQGIQHK 

GMGMEVRARSKAARKNWDTADKRRDTLGFSVLHWRDNPTSIIASGVHYHHPDGVLYLTQFTQVA 

MATVAAAQVAEMREQGAFVEGAIACGHSVGEYTALACVTGIYQLEALLEMVFHRGSKMHDIVPRDEL 

GRSNYRLAAIRPSQIDLDDADVPAFVAGIAESTGEFLEIVNFNLRGSQYAIAGTVRGLEALEAEVERRR 

ELTGGRRSFILVPGIDVPFHSRVLRVGVAEFRRSLDRVMPRDADPDLIIGRYIPNLVPRLFTLDRDFIQ 

EIRDLVPAEPLDEILADYDTWLRERPREMAR7VFIELLAWQFASPVRWIETQDLLFIEEAAGGLGVERF 

VEIGVKSSPTVAGLATNTLKLPEYAHSTVEVLNAERDAAVLFATDTDPEPEPEEDEPVAESPAPDWS 

EAAPVAPAASSAGPRPDDLVFDAADATLALIALSAKMRIDQIEELDSIESITDGASSRRNQLLVDLGSE 

LNLGAIDGAAESDLAGLRSQVTKLARTYKPYGPVLSDAINDQLRTVLGPSGKRPGAIAERVKKTWELG 

EGWAKHVWEVALGTREGSSVRGGAMGHLHEGALADAASVDKVIDAAVASVAARQGVSVALPSAG 

SGGGATIDAAALSEFTDQITGREGVLASAARLVLGQLGLDDPVNALPAAPDSELIDLVTAELGADWPR 

LVAPVFDPKKAWFDDRWASAREDLVKLWLTDEGDIDADWPRLAERFEGAGHWATQATWWQGKS 

LAAGRQIHASLYGRIAAGAENPEPGRYGGEVAWTGASKGSIAASWARLLDGGATVIATTSKLDEER 

UXFYRTLYRDHARYGAALWLVAANMASYSDVDALVEWlGTEQTESLGPQSIHIKDAQTPTLLFPFAAP 

RWGDLSEAGSRAEMEMKVLLWAVQRUIGGLSTIGAERDIASRLHNA/LPGSPNRGMFGGDGAYGEA 

KSALDAWSRWHAESSWAARVSLAHALIGWTRGTGLMGHNDAIVAAVEEAGV1TYSTDEMAALLLD 

LCDAESKVAAARSPIKADLTGGLAEANLDMAELAAKAREQMSAAAAVDEDAEAPGAIAALPSPPRGF 

TPAPPPQWDDLDVDPADLWIVGGAEIGPYGSSRWEMEVENELSAAGVLELAVVTTGLIRWEDDP 

QPGWYDTESGEMVDESELVQRYHDAWQRVGIREFVDDGAIDPDHASPUVSVFLEKDFAFWSSE 

ADARAFVEFDPEHTVIRPVPDSTDWQVIRKAGTEIRVPRKTKLSRWGGQIPTGFDPTVWGISADMA 

GSIDRLAVWNMVATVDAFLSSGFSPAEVMRYVHPSLVANTQGTGMGGGTSMQTMYHGNLLGRNKP 

NDIFQEVLPNIIAAHWQSYVGSYGAMIHPVAACATAAVSVEEGVDKIRLGKAQLWAGGLDDLTLEGII 

GFGDMAATADTSMMCGRGIHDSKFSRPNDRRRLGFVEAQGGGTILLARGDLALRMGLPVLAWAFA 
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QSFGDGVHTSIPAPGLGALGAGRGGKDSPLARALAKLGVAADDVAVISKHDTSTLANDPNETELHER 
LADALGRSEGAPLFWSQKSLTGHAKGGAAVFQMMGLCQILRDGVIPPNRSLDCVDDELAGSAHFV 
WVRDTLRLGGKFPLKAGMLTSLGFGHVSGLVALVHPQAFIASLDPAQRADYQRRADARLLAGQRRL 
ASAIAGGAPMYQRPGDRRFDHHAPERPQEASMLLNPAARLGDGEAYIG 

5 

>Rv2555c alaS alanyl-tRNA synthase TB.seq 2873772:2876483 MW:97326 SEQ ID N0.247 
VQTHEIRKRFLDHFVKAGHTEVPSASVILDDPNLLFVNAGMVQFVPFFLGQRTPPYPTATSIQKCIRTP 
DIDEVGITTRHNTFFQMAGNFSFGDYFKRGAIELAWALLTNSLAAGGYGLDPERIWTTVYFDDDEAV 
RLWQEVAGLPAERIQRRGMADNYWSMGIPGPCGPSSEIYYDRGPEFGPAGGPIVSEDRYLEVWNL 

10 VFMQNERGEGTTKEDYQILGPLPRKNIDTGMGVERIALVLQDVHNVYETDLLRPV1DTVARVAARAYD 
VGNHEDDVRYRIIADHSRTAAILIGDGVSPGNDGRGYVLRRLLRRVIRSAKLLGIDAAIVGDLMATVRN 
AMGPSYPELVADFERISRIAVAEETAFNRTLASGSRLFEEVASSTKKSGA7VLSGSDAFTLHDTYGFPI 
ELTLEMAAETGLQVDEIGFRELMAEQRRRAKADAAARKHAHADLSAYRELVDAGATEFTGFDELRS 
QARILGIFVDGKRVPWAHGVAGGAGEGQRVELVLDRTPLYAESGGQIADEGTISGTGSSEAARAAV 

15 TDVQKIAKTLVWHRVNVESGEFVEGDTVIAAVDPGWRRGATQGHSGTHMVHAALRQVLGPNAVQA 
GSLNRPGYLRFDFNWQGPLTDDQRTQVEEVTNEAVC^FEVRTFTEQLDKAKAMGAIALFGESYPD 
EVRWEMGGPFSLELCGGTHVSNTAQIGPVTILGESSIGSGVRRVEAYVGLDSFRHLAKERALMAGL 
ASSLKVPSEEVPARVANLVERLRAAEKELERVRMASARAAATNAAAGAQRIGNVRLVAQRMSGGMT 
AADLRSLIGDIRGKLGSEPAWALIAEGESQTVPYAVAANPAAQDLGIRANDLVKQLAVAVEGRGGGK 

20 ADLAQGSGKNPTGIDAALDAVRSEIAVIARVG 

>Rv2580c hisS histidyl-tRNA synthase TB.seq 2904822:2906090 MW:451 1 8 SEQ ID NO:248 
VTEFSSFSAPKGVPDYVPPDSAQFVAVRDGLLAAARQAGYSHIELPIFEDTALFARGVGESTDWSKE 
MYTFADRGDRSVTLRPEGTAGVVRAVIEHGLDRGALPVKLCYAGPFFRYERPQAGRYRQLQQVGV 
25 EAIGVDDPALDAEVIAIADAGFRSLGLDGFRLEITSLGDESCRPQYRELLQEFLFGLDLDEDTRRRAGI 
NPLRVLDDKRPELRAMTASAPVLLDHLSDVAKQHFDTVLAHLDALGVPWINPRMVRGLDYYTKTAF 
EFVHDGLGAQSGIGGGGRYDGLMHQLGGQDLSGIGFGLGVDRTVUVLRAEGKTAGDSARCDVFGV 
PLGEAAKLRLAVLAGRLRAAGVRVDLAYGDRGLKGAMRAAARSGARVALVAGDRDIEAGTVAVKDL 
TTGEQVSVSMDSWAEVISRLAG 

30 

>Rv2614c thrS threonyMRNA synthase TB.seq 2941 190:2943265 MW:771 23 SEQ ID NO:249 
MSAPAQPAPGVDGGDPSQARIRVPAGTTAATAVGEAGLPRRGTPDAIWVRDADGNLRDLSWVPD 
VDTDITPVAANTDDGRSVIRHSTAHVLAQAVQELFPQAKLGIGPPITDGFYYDFDVPEPFTPEDLAALE 
KRMRQIVKEGQLFDRRVYESTEQARAELANEPYKLELVDDKSGDAEIMEVGGDELTAYDNLNPRTR 
35 ERVWGDLCRGPHIPTTKHIPAFKLTRSSAAYWRGDQKNASLQRIYGTAWESQEALDRHLEFIEEAQR 
RDHRKLGVELDLFSFPDEIGSGLAVFHPKGGIVRRELEDYSRRKHTEAGYQFVNSPHITKAQLFHTSG 
HLDWYADGMFPPMHIDAEYNADGSLRKPGQDYYLKPMNCPMHCLIFRARGRSYRELPLRLFEFGTV 
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YRYEKSGWHGLTRVRGLTMDDAHIFCTRDQMRDELRSLLRFVLDLLADYGLTDFYLELSTKDPEKF 
VGAEEVWEEATTVLAEVGAESGLELVPDPGGAAFYGPKISVQVKDALGRTWQMSTIQLDFNFPERF 
GLEYTAADGTRHRPVMIHRALFGSIERFFGILTEHYAGAFPAWLAFVQWGIPVADEHVAYLEEVATQ 

LKSHGVRAEVDASDDRMAKKIVHHTNHKVPFMVLAGDRDVAAGAVSFRFGDRTQINGVARDDAVAA 
IVAWIADRENAVPTAELVKVAGRE 

>Rv2697c dut deoxyuridine triphosphatase TB.seq 3013683:3014144 MW:1 5772 SEQ ID NO:250 
VSTTLAIVRLDPGLPLPSRAHDGDAGVDLYSAEDVELAPGRRALVRTGVAVAVPFGMVGLVHPRSGL 
ATRVGUSIVNSPGTIDAGYRGEIKVALINLDPAAPIWHRGDRIAQLLVQRVELVELVEVSSFDEAGLAS 
TSR6DGGHGSSGGHASL 

>Rv2782c pepR protease/peptidase, M1 6 family (insulinase) TB.seq 3089045:3090358 MW:47074 
SEQ ID NO:251 

MPRRSPADPAAALAPRRTTLPGGLRWTEFLPAVHSASVGVWVGVGSRDEGATVAGAAHFLEHLLF 

KSTPTRSAVDIAQAMDAVGGELNAFTAKEHTCYYAHVLGSDI-PLAVDLVADNA/LNGRCAADDVEVER 

DWLEEIAMRDDDPEDALADMFLAALFGDHPVGRPVIGSAQSVSVMTRAQLQSFHLRRYTPERMW 

AAAGNVDHDGLVALVREHFGSRLVRGRRPVAPRKGTGRVNGSPRLTLVSRDAEQTHVSLGIRTPGR 

GWEHRWALSVLHTALGGGLSSRLFQEVRETRGLAYSVYSALDLFADSGALSVYAACLPERFADVMR 

VTADVLESVARDGITEAECGIAKGSLRGGLVLGLEDSSSRMSRLGRSELNYGKHRSIEHTLRQIEQVT 

VEEVNAVARHLLSRRYGAAVLGPHGSKRSLPQQLRAMVG 

>Rv2783c gpsl pppGpp synthase and polyribonucleotide phosphorylase TB.seq 

3090339:3092594 MW:79736 SEQ ID NO:252 

MSAAEIDEGVFETTATIDNGSFGTRTIRFETGRLALQAAGAWAYLDDDNMLLSATTASKNPKEHFDF 

FPLTVDVEERMYAAGRIPGSFFRREGRPSTDAILTCRLIDRPLRPSFVDGLRNEIQIWTILSLDPGDLY 

DVUMNAASASTQLGGLPFSGPIGGVRVAUDGTVWGFPTVDQIERAVFDMWAGRIVEGDVAIMMVE 

AEATENWELVEGGAQAPTESWAAGLEAAKPFIAALCTAQQELADAAGKSGKPTVDFPVFPDYGED 

VYYSVSSVATDEL^AALTIGGKAERDQRIDEIKTQWQRLADTYEGREKEVGAALRALTKKLVRQRILT 

DHFRI DGRGITDI RALSAEVAWPRAHGSALFERGETQI LGVTTLDMIKMAQQIDSLGPETSKRYMH H 

YNFPPFSTGETGRVGSPKRREIGHGALAERALVPVLPSVEEFPYAIRQVSEALGSNGSTSMGSVCAS 

TLALLNAGVPLKAPVAGIAMGLVSDDIQVEGAVDGWERRFVTLTDILGAEDAFGDMDFKVAGTKDFV 

TALQLDTKLDGIPSQVU^GALEQAKDARLTILEVMAEAIDRPDEMSPYAPRVmiKVPVDKIGEVIGPK 

GKVINAITEETGAQISIEDDGTVFVGATDGPSAQAAIDKINAIANPQLPWGERFLGTWKTTDFGAFVS 

LLPGRDGLVHISKLGKGKRIAKVEDWNVGDKLRVEIADIDKRGKISLILVADEDSTAAATDAATVTS 

>Rv2793c truB tRNA pseudouridine 55 synthase TB.seq 3102364:3103257 MW:31821 

SEQ ID NO:253 

MSATGPGIWIDKPAGMTSHDWGRCRRIFATRRVGHAGTLDPMATGVLVIGIERATKILGLLTAAPKS 
YAATIRLGQTTSTEDAEGQVLQSVPAKHLTIEAIDAAMERLRGEIRQVPSSVSAIKVGGRRAYRLARQ 
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GRSVQLEARPIRIDRFELLMRRRDQLIDIDVEIDCSSGTYIRALARDLGDALQVGGHVTALRRTRVGR 
FELDQARSLDDLAERPALSLSLDEACLLMFARRDLTAAEASAAANGRSLPAVGIDGVYAACDADGRVI 

ALLRDEGSRTRSVAVLRPATMHPG 

>Rv2797c - TB.seq 3105619:3107304 MW58761 SEQ ID NO:254 

VPLWADIDRWNAC^VREVFHMSARAEVTTEASRQLAALSIFANSGGKTAEAAAHHNAGIRRDLDA 

HGNEALAVARMDRMDGIVKVQSELAALRHAAAAAELTIDAUNRWPIPGLRSTEAQWARTLAKQT 

ELQAELDA1MAEANAVDEELASAVNMADGDAPIPADSGPPVGPEGLTPTQLASDANEERLREERARL 

QAHLERLQAEYDQLSVRAARDYHNGILDGDAVGRLAALTDELSAARGRLGELDAVDEALSRAPETYL 

TQLQIPEDPNQQVLAAVAVGNPDTAANVSVTVPGVGSTTRGALPGMVTEARDLRSEVIRQLNAAGK 

PASVATIAWMGYHPPPNPUDTGSAGDLWQTMTDGQAHAGAADLSRYLCIQVRANNPSGHLTVLGHS 

YGSLTASLALQDLDAQSAHPVNDWFYGSPGLELYSPAQLGLDHGHAYVMQAPHDLITNLVAPLAPL 

HGWGLDPYLTPGFTELSSQAGFDPGGIWRDGVYAHGDYPRSFLDAAGQPQLRMSGYNLAAIAAGL 

PDNTVGPPLLPPILGGGMPAAPGPALRGGR 

>Rv2864c ponA2 TB.seq 3175454:3177262 MW:63015 SEQ ID NO:255 

MVTKTTLASATSGLLLLAWAMSGCTPRPQGPGPAAEKFFAALAIGDTASAAQLSDNPNEAREALNA 

AWAGLCW\HLDAQVLSAKYAEDTGWAYRFSWHLPKDRIVVTYDGQLKMARDEGR\^RWTTSGL 

HPKLGEHQTFALRADPPRRASVNEVGGTDVLVPGYLYHYSLDAGQAGRELFGTAHAWGALHPFDD 

TLNDPQLIJkECMSSSTQPLDLVTLHADDSNRVAAAIGQLPGWITPQAELLPTDKHFAPAVLNDVKKA 

WDELDGKAGWRWSVNQNGVDVSVLHEVAPSPASSVSITLDRWQNAAQHAVNTRGGKAMIWIK 

PSTGEILAIAQNAGADADGPVATTGLYPPGSTFKMITAGAAVERDLATPETLLGCPGEIDIGHRTIPNY 

GGFDLGWPMSRAFASSCNTTFAELSSRLPPRGLT(MARRYGIGLDYQVDGITTVTGSVPP7VDLAE 

RTEDGFGQGKVLASPFGhMLVAAWMGKTPVPQUAGRPTAVEGDATPlSQKMIDALRPMMRLVVT 

NGTAKEIAGCGEVFGKTGEAEFPGGSHSWFAGYRGDLAFASLIVGGGSSEYAVRMTKVMFESLPPG 

YLA 

>Rv2868c gcpE TB.seq 3179368:3180528 MW:40451 SEQ ID NO:256 

VWGLGMPQPPAPTUVPRRATRQLMVGNVGVGSDHPVSVQSMGTTKTHDVNSTLQQIAELTAAGC 

DIVRVACPRQEDADALAEIARHSQIPWADIHFQPRYIFAAIDAGCAAVRVNPGNIKEFDGRVGEVAKA 

AGAAGIPIRIGVNAGSLDKRFMEKYGKATPEALVESALWEASLFEEHGFGDIKISVKHNDPWMVAAY 

ELLAARCDYPLHLGVTEAGPAFQGTIKSAVAFGALLSRGIGDTIRVSLSAPPVEEVKVGNQVLESLNL 

RPRSLEIVSCPSCGRAQVDVYTLANEVTAGLDGLDVPLRVAVMGCWNGPGEAREADLGVASGNGK 

GQIFVRGEVIKTVPEAQIVETLIEEAMRLAAEMGEQDPGATPSGSPIVTVS 

>Rv2869c- TB.seq 3180548:3181759 MW:42835 SEQIDNO:257 

MMFVTGIVLFAUVILISVALHECGHMVVVARRTGMKVRRYFVGFGPTLWSTRRGETEYGVKAVPLGG 

FCDIAGMTPVEELDPDERDRAMYKQATWKRVAVLFAGPGMNI^CLVLIYAIALVWGLPNLHPPTP^V 

IGETGCVAQEVSQGKLEQCTGPGPMI^GIRSGDWVKVGDTPVSSFDEMAAAVRKSHGSVPIWE 
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RDGTAIVTYVDIESTQRWIPNGQGGELQPATVGAIGVGAARVGPVRYGVFSAMPATFAVTGDLTVEV 
GKALAALPTKVGALVRAIGGGQRDPQTPISWGASIIGGDTVDHGLWVAFWFFLAQLNLILAAINLLPL 
LPFDGGHIAVAVFERIRNMVRSARGKVAAAPVNYLKLLPATYWLVLWGYMLLTVTADLVNPIRLFQ 
>Rv2870c - TB.seq 3181770:3183077 MW:45324 SEQ ID NO:258 

VATGGRWIRRRGDNEWAHNDEVTNSTDGRADGRLRWVLGSTGSIGTQALQVIADNPDRFEWG 

LAAGGAHLDTLLRQRAQTGVTNIAVADEHAAQRVGDIPYHGSDAATRLVEQTEADNA/LNALVGALGL 

RPTLykALKTGARUVLANKESLVAGGSLVLRAARPGQIVPVDSEHSALAQCLRGGTPDEVAKLVLTAS 

GGPFRGWSAADLEHVTPEQAGAHPTWSMGPMNTLNSASLVNKGLEVIETHLLFGIPYDRID\A/VHP 

QSIIHSMVTFIDGSTIAQASPPDMKLPISLALGWPRRVSGAAAACDFHTASSWEFEPLDTDVFPAVEL 

ARQAGVAGGCMTAWNAANEEAAAAFLAGRIGFPAIVGIIADVLHAADQWAVEPATVDDVLDAQRWA 

RERAQRAVSGMASVAIASTAKPGAAGRHASTLERS 

>Rv2922c smc member of Smc1/Cut3/Cut14 family TB.seq 3234189:3238055 MW:139610 
SEQIDNO:259 

VGAGSRFPLVDPLPSVGARPDRLRGQPRRRTRAGGRPGSARCVPEAAAAAAGRHDTGPRRQSRR 

RLVAVDGADHRVQRAVIWPLVYUKSLTIKGFKSFMPTTLRFEPGITAWGPNGSGKSNWDALAWV 

MGEQGAKTl-RGGKMEDVIFAGTSSRAPLGRAEVTVSIDNSDNALPIEYTEVSITRRMFRDGASEYEIN 

GSSCRLMDVQELLSDSG1GREMHVIVGQGKLEEILQSRPEDRRAFIEEAAGVLXHRKRKEKALRKLDT 

MAANLARLTDLTTELRRQLKPLGRQAEAAQRAAAIQADLRDARLRLAADDLVSRRAEREAVFQAEAA 

MRREHDEAAARLAVASEELAAHESAVAELSTRAESIQHTWFGLSALAERVDATVRIASERAHHLDIEP 

VAVSDTDPRKPEELEAEAQQVAVAEQQLLAELDAARARLDAARAELADRERRAAEADRAHLAAVRE 

EADRREGLARLAGQVETMRARVESIDESVARLSERIEDAAMRAQQTRAEFETVQGRIGELDQGEVG 

LDEHHERTVAALRLADERVAELQSAERAAERQVASLRARIDALAVGLQRKDGAAVVLAHNRSGAGLF 

GSIAQLVKVRSGYEAALAAALGPAADALAVDGLTAAGSAVSALKQADGGRAVLVLSDWPAPQAPQS 

ASGEMLPSGAQWALDLVESPPQLVGAMIAMLSGVAWNDLTEAMGLVEIRPELRAVTVDGDLVGAG 

WVSGGSDRKLSTLEVTSEIDKARSELAMEALAAQLNAAU^GALTEQSARQDAAEQALAALNESDTAI 

SAMYEQLGRLGQEARAAEEEWNRLLQQRTEQEAVRTQTLDDVIQLETQLRKAQETQRVQVAQPIDR 

QAISAAADRARGVEVEARLAVRTAEERANAVRGRADSLRRAAAAEREARVRAQQARAARLHAAAVA 

AAVADCGRLLAGRLHRAVDGASQLRDASAAQRQQRLAAMAAVRDEVNTLSARVGELTDSLHRDEL 

ANAQAALRIEQLEQMVLEQFGMAPADLITEYGPHVALPPTELEMAEFEQARERGEQVIAPAPMPFDR 

VTQERRAKRAERALAELGRVNPlJ\LEEFAALEERYNFLSTQLEDVKAARKDLLGWADVDARILQVFN 

DAFVDVEREFRGVFTALFPGGEGRLRLTEPDDMLTTGIEVEARPPGKKITRLSLLSGGEKALTAVAML 

VAIFRARPSPFYIMDEVEAALDDVNLRRLLSLFEQLREQSQIIIITHQKPTMEVADALYGVTMQNDGITA 

VISQRMRGQQVDQLVTNSS 

>Rv2925cmc RNAse III TB.seq 3239829:3240548 MW:25400 SEQ ID NO:260 

MIRSRQPLLDALGVDLPDELLSLALTHRSYAYENGGLPTNERLEFLGDAVLGLTITDALFHRHPDRSE 

GDU\KLRASWNTQALADVARRLCAEGLGVHVLLGRGEANTGGADKSSILADGMESLLGAIYLQHGM 
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EKAREVILRLFGPLLDAAPTLGAGLDWKTSLQELTAARGLGAPSYLVTSTGPDHDKEFTAVWVMDS 
EYGSGVGRSKKEAEQKAAAAAWKALEVLDNAMPGKTSA 

>Rv2934 ppsD TB.seq 3262245:3267725 MW:193317 SEQ ID NO:261 

6 MTSLAERAAQLSPNARAALARELVRAGTTFPTDICEPVAWGIGCRFPGNVTGPESFWQLLADGVDT 
IEQVPPDRWDADAFYDPDPSASGRMTTKWGGFVSDVDAFDADFFGITPREAVAMDPQHRMLLEVA 
WEALEHAGIPPDSLSGTRTGVMMGLSSWDYTIVNIERRADIDAYLSTGTPHCAAVGRIAYLLGLRGPA 
VAVDTACSSSLVAIHLACQSLRLRETDVALAGGVQLTLSPFTAIALSKWSALSPTGRCNSFDANADGF 
VRGEGCGNAA/LKRLAOAVRDQDRVLAWRGSATNSDGRSNGMTAPNAI^QRDVITSALKLADVTPD 

10 SVNYVETHGTG7VLGDPIEFESLAATYGLGKGQGESPCALGSVKTNIGHLEAAAGVAGFIKAVLAVQR 
GHIPRNLHFTRWNPAIDASATRLFVPTESAPWPAAAGPRRAAVSSFGLSGTNAHN/WEQAPDTAVAA 
AGGMPWSAUJVSGKTAARVASAMVl^WMSGPGAAAPLADVAHTLNRHRARHAKFATVIARDRA 
EAIAGLRALAAGQPRVGWDCDQHAGGPGRVFVYSGQGSQWASMGQQLLANEPAFAKAVAELDPI 
FVDQVGFSLCK3TLIIX3DEWGIDRIQPVLVGMQLALTELWRSYGVIPDAVIGHSMGEVSAAWAGALT 

15 PEQGLRVITTRSRLMARLSGQGAMALLELDADMEALIAGYPQNmAVHASPRQTVIAGPPEQVDTVI 
AAVATQNRLARRVEVDVASHHPIIDPILPELRSALADLTPQPPSIPIIST7YESAQPVADADYWSANLRN 
PVRFHQAVTAAGVDHNTFIEISPHPVLTHALTDTLDPDGSHTVMSTMNRELOQTLYFHAQLAAVGVA 
ASEHTTGRLVDLPPTPWHHQRFVVVTDRSAMSELAATHPLLGAHIEMPRNGDHVWQTDVGTEVCPW 
LADHKVFGQPIMPAAGFAEIAI^AASEALGTAADAVAPNIVINQFEVEQMLPUJGHTPLTTQLIRGGDS 

20 QIRVEIYSRTRGGEFCRHATAKVEQSPRECAHAHPEAQGPATGTTVSPADFYALLRQTGQHHGPAF 
AALSRIVRLADGSAETEISIPDEAPRHPGYRI.HPWLDAALQSVGAAIPDGEIAGSAEASYLPVSFETIR 
VYRDIGRHVRCRAHLTNLDGGTGKMGRIVLINDAGHIAAEVDGIYLRRVERRAVPLPLEQKIFDAEWT 
ESPIAAVPAPEPAAETTRGSVVLVLADATVDAPGKAQAKSMADDFVQQWRSPMRRVHTADIHDESAV 
UVAFAETAGDPEHPPVGNAA/FVGGASSRLDDELAMRDTVWSITTWRAWGTWHGRSPRLWLVTG 

25 GGLSVADDEPGTPAAASLKGLVRVLAFEHPDMRTTLVDLDITQDPLTALSAELRNAGSGSRHDDVIA 
WRGERRFVERLSRATIDVSKGHPWRQGASYWTGGUGGLGLWARWLVDRGAGRWLGGRSDPT 
DEQCNVLAELQTRAEINAA/RGDVASPGVAEKLIETARQSGGQLRGWHAAAVIEDSLVFSMSRDNLE 
RNAWAPKATGALRMHEATADCELDVVWLGFSSAASLLGSPGQAAYACASAWLDALVGWRRASGLPA 
AVINWGPWSEVGVAQALVGSVLDTISVAEGIEALDSLLAADRIRTGVARLRADRALVAFPEIRSISYFT 

30 QWEELDSAGDLGDWGGPDALADLOPGEARRAVTERMCARIAAVMGYTDQSTVEPAVPLDKPLTEL 
GLDSLMAVRIRNGARADFGVEPPVALILQGASLHDLTADLMRQLGLNDPDPALNNADTIRDRARQRA 
AARHGAAMRRRPKPEVQGG 

>Rv2946c pksl TB.seq 3291503:3296350 MW:1 66642 SEQ ID N0.262 
35 VISARSAEALTAQAGRLMAHVQANPGLDPIDVGCSLASRSVFEHRANAA/GASREQLIAGLAGLAAGE 
PGAGVAVGQPGSVGKTWVFPGQGAQRIGMGRELYGELPVFAQAFDAVADELDRHLRLPLRDVIW 
GADADLLDSTEFAQPALFAVEVASFAVLRDWGVLPDFVMGHSVGELAAAHAAGVLTLADAAMLWA 
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RGRLMQALPAGGAMVAVAASEDEVEPUGEGVGIAAINAPESWISGAQAAANAIADRFMQGRRVH 

QLAVSHAFHSPLMEPMLEEFARVAARVQAREPQLGLVSNVTGELAGPDFGSAQYWVDHVRRPVRF 

ADSARHLQTLGATHFIEAGPGSGLTGSIEQSLAPAEAMWSMLGKDRPELASALGAAGQVFTTGVPV 

QWSAVFAGSGGRRVQLPTYAFQRRRFWETPGADGPADAAGLGLGATEHALLGAWERPDSDEWL 

TGRLSUVDQPWUVDHWNGWLFPGAGFVELVIRAGDEVGC^ 

GAADESGHRAVSVYSRGDQSQGWLLNAEGMLGVAAAETPMDLSVWPPEGAESVDISDGYAQLAE 

RGYAYGPAFQGUVAIWRRGSELFAEWAPGEAGVAVDRMGMHPAVLDAVLHALGLAVEKTQASTET 

RLPFCWRGVSLHAGGAGRVRARFASAGADAISVDVCDATGLPVLTVRSLVTRPITAEQLRAAVTAAG 

GASDCK3PLEWWSPISWSGGANGSAPPAPVSWADFCAGSDGDASVWWELESAGGQASSWGS 

VYAATHTALEVLQSWLGADRAATLWLTHGGVGLAGEDISDLAAAAVWGMARSAQAENPGRIVLIDT 

DAAVDASVLAGVGEPQLLVRGGTVHAPRLSPAPALLALPAAESAWRLAAGGGGTLEDLVIQPCPEV 

(MPLC^GQVRVAVMVGVNFRDWAALGMYPGQAPPLGAEGAGVVLETGPEVTDLAVGDAVMGFL 

GGAGPLANA/DQQLVTRVPQGWSFAQAAAVPWFLTAWYGLADLAEIKAGESVLIHAGTGGVGMAAV 

QI^RQWGVEVF\n"ASRGKWDTLRAMGFDDDHIGDSRTCEFEEKFLAWEGRGVDWLDSLAGEFV 

DASLRUVRGGRFLEMGKTDIRDAQEIAANYPGVQYRAFDLSEAGPARMQEMLAEVRELFDTRELH 

RLPVnTWDVRCAPAAFRFMSQARHIGKWLTMPSALADRLADGTWITGATGAVGGVLARHLVGAY 

GVRHLVLASRRGDRAEGAAELAADLTEAGAKVQWACDVADRAAVAGLFAQLSREYPPVRGVIHAA 

GVLDDAVITSLTPDRIDTVLRAKVDAAWNLHQATSDLDLSMFALCSSIAATVGSPGQGNYSAANAFLD 

GLAAHRQAAGIAGISUVWGLWEQPGGMTAHLSSRDLARMSRSGLAPMSPAEAVELFDAALAIDHPL 

AVATLLDRAALDARAQAGALPALFSGLARRPRRRQIDDTGDATSSKSALAQRLHGLAADEQLELLVG 

LVCLQAAAVLGRPSAEDVDPDTEFGDLGFDSLTAVELRNRLKTATGLTLPPTVIFDHPTPTAVAEYVA 

QQMSGSRPTESGDPTSQWEPAAAEVSVHA 

>Rv3014c ligA DNA ligase TB.seq 3372545:3374617 MW:75258 SEQ ID NO:263 

VSSPDADQTAPEVLRQWQALAEEVREHQFRYYVRDAPIISDAEFDELLRRLEALEEQHPELRTPDSP 

TQLVGGAGFATDFEPVDHLERMLSLDNAFTADELAAWAGRIHAEVGDAAHYLCELKIDGVALSLVYR 

EGRLTRASTRGDGRTGEDVTLNARTIADVPERLTPGDDYPVPEVLEVRGEVFFRLDDFQALNASLVE 

EGKAPFANPRNSAAGSLRQKDPAVTARRRLRMICHGLGHVEGFRPATLHQAYLALRAWGLPVSEHT 

TLATDLAGVRERIDYWGEHRHEVDHEIDGWVKVDEVALQRRLGSTSRAPRWAIAYKYPPEEAQTKL 

LDIRVNVGRTGRITPFAFMTPVKVAGSTVGQATLHNASEIKRKGVLIGDTWIRKAGDVIPEVLGPWE 

LRDGSEREFIMPTTCPECGSPLAPEKEGDADIRCPNARGCPGQLRERVFHVASRNGLDIEVLGYEAG 

VALLQAKVIADEGELFALTERDLLRTDLFRTKAGELSANGKRLLVNLDKAKAAPLWRVLVALSIRHVGP 

TAARALATEFGSLDAIAAASTDQLMVEGVGPTIAAAWEWFAVDWHREIVDKWRAAGVRMVDERD 

ESVPRTLAGLTIWTGSLTGFSRDDAKEAIVARGGKAAGSVSKKTNYWAGDSPGSKYDKAVELGVPI 

LDEDGFRRLLADGPASRT 

>Rv3025c - NifS-like protein TB.seq 3383885:3385063 MW:40948 SEQ ID NO:264 

MAYLDHAATTPMHPAAIEAMAAVQRTIGNASSLHTSGRSARRRIEEARELIADKLGARPSEVIFTAGG 

TESDNLAVKGIYWARRDAEPHRRRIVTTEVEHHAVLDSVNWLVEHEGAHVTWLPTAADGSVSATAL 
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REALQSHDDVALVSVMWANNEVGTILPIAEMSWAMEFGVPMHSDAIQAVGQLPLDFGASGLSAMS 
VAGHKFGGPPGVGALLLRRDVTCVPLMHGGGQERDIRSGTPDVASAVGMATAAQIAVDGLEENSAR 
LRLLRDRLVEGVLAEIDDVCLNGADDPMRLAGNAHFTFRGCEGDALLMLLDANGIECSTGSACTAGV 
AQPSHVLIAMGVDAASARGSLRLSLGHTSVEADVDAALEVLPGAVARARRAALAAAGASR 

>Rv3080c pknK serine-threonine protein kinase TB.seq 3442656:3445985 MW:1 19420 
SEQ ID NO:265 

MTDVDPHATRRDLVPNIPAELLEAGFDNVEEIGRGGFGWYRCVQPSLDRAVAVKVLSTDLDRDNLE 

RFLREQRAMGRLSGHPHIVTVLQVGVLAGGRPFIVMPYHAKNSLETLIRRHGPLDWRETLSIGVKLA 

GALEAAHRVGTLHRDVKPGNILLTDYGEPQLTDFGIARIAGGFETATGVIAGSPAFTAPEVLEGASPTP 

ASDVYSLGATLFCALTGHAAYERRSGERVIAQFLRITSQPIPDLRKQGLPADVAAAIERAMARHPADR 

PATAADVGEELRDVQRRNGVSVDEMPLPVELGVERRRSPEAHAAHRHTGGGTPWPTPPTPATKY 

RPSVPTGSLVTRSRLTDILRAGGRRRLILIHAPSGFGKSTLAAQWREELSRDGAAVAWLTIDNDDNNE 

VWFLSHLLESIRRVRPTLAESLGHVLEEHGDDAGRYVLTSUDEIHENDDRIAWIDDWHRVSDSRTQ 

AALGFLLDNGCHHLQLIVTSWSRAGLPVGRLRIGDELAEIDSAALRFDTDEAAALLNDAGGLRLPRAD 

VQALTTSTDGWAAALRLAALSLRGGGDATQLLRGLSGASDVIHEFLSENVLDTLEPELREFU.VASVT 

ERTCGGLASALAGITNGRAMLEEAEHRGLFLQRTEDDPNWFRFHQMFADFLHRRLERGGSHRVAEL 

HRRASAWFAENGYLHEAVDHALAlAGDPARAVDLVEQDETNLPEQSKMTTLLAIVQKLPTSMWSRA 

RLQLAIAWANILLQRPAPATGALNRFETALGRAELPEATQADLRAEADVLRAVAEVFAORVERVDDLL 

AEAMSRPDTLPPRVPGTAGNTAALAAICRFEFAEVYPLLDWAAPYQEMMGPFGTVYAQCLRGMAAR 

NRLDIVAALQNFRTAFEVGTAVGAHSHAARLAGSLLAELLYETGDLAGAGRLMDESYLLGSEGGAVD 

YIAARYVIGARVKAAQGDHEGAADRLSTGGDTAVQLGLPRLAARINNERIRLGIALPAAVAADLLAPR 

TIPRDNGIATMTAELDEDSAVRLLSAGDSADRDQACQRAGALAAAIDGTRRPLAALQAQILHIETLAAT 

GRESDARNELAPVATKCAELGLSRLLVDAGLA 

>Rv3106 fprA adrenodoxin and NADPH ferredoxin reductase TB.seq 3474004:3475371 
MW:49342 SEQ ID NO:266 

MRPYYIAIVGSGPSAFFAAASLLKAADTTEDLDMAVDMLEMLPTPWGLVRSGVAPDHPKIKSISKQFE 

KTAEDPRFRFFGNVWGEHVQPGELSERYDAVIYAVGAQSDRMLNIPGEDLPGSIAAVDFVGWYNA 

HPHFEQVSPDLSGARAWIGNGNVALDVARILLTDPDVLARTDIADHALESLRPRGIQEWIVGRRGPL 

QMFITLELRELADLDGVDW1DPAELDGITDEDAAAVGKVCKQNIKVLRGYADREPRPGHRRMVFR 

FLTSPIEIKGKRKVERIVLGRNELVSDGSGRVAAKDTGEREELPAQLWRSVGYRGVPTPGLPFDDQ 

SGTIPNVGGRINGSPNEYWGWIKRGPTGVIGTNKKDAQDTVDTLIKNLGNAKEGAECKSFPEDHAD 

QVADWLAARQPKLVTSAHWQVIDAFERAAGEPHGRPRVKLASLAELLRIGLG 

>Rv3235 -TB.seq 361 1296:361 1934 MW:22659 SEQ ID NO:267 

MMASNQTAAQHSSATLQQAPRSIDDAGGCPLTISPIANSPGDTFAVTPWEYEPPPRNIPPCGQSSH 
AARRPHTPQLARRQPIRPSGRAPAAVTSTAKSPRLRQAGTFADAALRRVLEVIDRRRPVGQLRPLLA 
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PGLVDSVLAVSRTMGHQQGAAMLRRIRLTPAGPDTADTAAEVFGTYSRGDRIHAIACRVEQRPAGN 
ETRWLMVALHIG 

>Rv3255c manA mannose-6-phosphate isomerase TB.seq 3635040:3636263 MW:43340 
SEQ ID NO:268 

VELLRGALRTYAWGSRTA1AEFTGRPVPAAHPEAELWFGAHPGDPAWLQTPHGQTSLLEALVADPE 

GQLGSASRARFGDVLPFLVKVLAADEPLSLQAHPSAEQAVEGYLREERMGIPVSSPVRNYRDTSHK 

PELLVALQPFEALAGFREMRTTELLRALAVSDLDPFIDLLSEGSDADGLRALFTTWITAPQPDIDVLV 

PAVLDGAIQYVSSGATEFGAEAKTVLELGERYPGDAGVLAALLLNRISLAPGEAIFLPAGNLHAYVRG 

FGVEVMANSDNVLRGGLTPKHVDVPELLRVLDFAPTPKARLRPPIRREGLGLVFETPTDEFAATLLVL 

DGDHLGHEVDASSGHDGPQILLCTEGSATVHGKCGSLTLQRGTAAWVAADDGPIRLTAGQPAKLFR 

ATVGL 

>Rv3264c rmlA2 gluoose-1-phosphate thymidyltransferase TB.seq 3644897:3645973 MW:37840 
SEQ ID NO:269 

UVTHQVDANA/LVGGKGTRLRPLTLSAPKPMLPTAGLPFLTHLLSRIAAAGIEHVILGTSYKPAVFEAEF 

GDGSALGLQIEYVTEEHPLGTGGGIANVAGKLRNDTAMVFNGDVLSGADLAQLLDFHRSNRADVTL 

QLVRVGDPRAFGCVPTDEEDRWAFLEKTEDPPTDQINA6CYVFERNVIDRIPQGREVSVEREVFPA 

LLADGDCKIYGYVDASYWRDMGTPEDFVRGSADLVRGIAPSPALRGHRGEQLVHDGAAVSPGALLI 

GG7WGRGAEIGPGTRLDGAVIFDGVRVEAGCVIERSIIGFGARIGPRALIRDGVIGDGADIGARCELL 

SGARVWPGVFLPDGGIRYSSDV 

>Rv3368c - TB.seq 3780334:3780975 MW:23734 SEQ ID NO:270 

MTLNLSVDEVLTTTRSVRKRLDFDKPVPRDVLMECLELALQAPTGSNSQGWQWVFVEDAAKKKAIA 
DVYLANARGYLSGPAPEYPDGDTRGERMGRVRDSA7YLAEHMHRAPVLLIPCLKGREDESAVGGVS 
FWASLFPAVWSFCLALRSRGLGSCWTTLHLLDI»4GEHKVADVLGIPYDEYSQGGLLPIAYTQGIDFRP 
AKRLPAESVTHWNGW 

>Rv3382c lytB1 TB.seq 3796447:3797433 MW:34667 SEQ ID NO:271 

MAEVFVGPVAQGYASGEVWLI^SPRSFCAGVERAIEWKRVLDVAEGPVWRKQIVHNTVWAELR 

DRGAVFVEDLDEIPDPPPPGANAA/FSAHGVSPAVRAGADERGLQWDATCPLVAKVHAEAARFAAR 

GDTWFIGHAGHEETEGTLGVAPRSTLLVQTPADVAALNLPEGTQLSYLTQTTLALDETADVIDALRA 

RFPTLGQPPSEDICYATTNRQRALQSMVGECDWLVIGSCNSSNSRRLVELAQRSGTPAYLIDGPDDI 

EPEWLSSVSTIGVTAGASAPPRLVGQVIDALRGYASITWERSIATETVRFGLPKQVRAQ 

>Rv3418c groES 10 kD chaperone TB.seq 3836985:3837284 MW:10773 SEQ ID NO:272 

VAKVNIKPLEDKILVQANEAETTTASGLVIPDTAKEKPQEGTWAVGPGRWDEDGEKRIPLDVAEGDT 

VIYSKYGGTEIKYNGEEYLILSARDVLAWSK 
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>Rv3423c air TB.seq 3840193:3841416 MW:43357 SEQ ID NO:273 

VKRWEWGKPNDTTDGRGTTSLAMTPISQTPGLLAEAMVDLGAIEHNVRVLREHAGHAQLMAWK 

ADGYGHGATRVAQTALGAGAAELGVATVDEALALRADGITAPVLAWLHPPGIDFGPALLADVQVAVS 

SLRQLDELLHAVRRTGRTATVTVKVDTGLNRNGVGPAQFPAMLTALRQAMAEDAVRLRGLMSHMV 

YADKPDDSINDVQAQRFTAFLAQAREQGVRFEVAHLSNSSATMARPDLTFDLVRPGIAVYGLSPVPA 

LGDMGLVPAMTVKCAVALVK8IRAGEGVSYGHTWIAPRDTNLALLPIGYADGVFRSLGGRLEVLINGR 

RCPGVGRICMDQFMVDLGPGPLDVAEGDEAILFGPGIRGEPTAQDWADLVGTIHYEWTSPRGRITR 

TYREAENR 

>Rv3490 otsA [alpha].-trehalose-phosphate synthase TB.seq 3908232:3909731 MW:55864 
SEQIDNO:274 

MAPSGGQEAQICDSFTFGDSDFVWANRLPVDLERLPDGSTTVVKRSPGGLVTALEPVLRRRRGAW 

VGWPGVNDDGAEPDLHVLDGPIIQDELELHPVRLSTTDIAQYYEGFSNATLWPLYHDVIVKPLYHRE 

WWDRYVDVNQRFAEAASRAAAHGATVWVQDYQLQLVPKMLRMLRPDLTIGFFLHIPFPPVELFMQ 

MPWRTEIIQGLLGADLVGFHLPGGAQNFLILSRRLVGTDTSRGTVGVRSRFGAAVLGSRTIRVGAFPI 

SVDSGALDHAARDRNIRRRAREIRTELGNPRKILLGVDRLDYTKGIDVRLKAFSELLAEGRVKRDDTV 

WQLATPSRERVESYQTLRNDIERQVGHINGEYGEVGHPWHYLHRPAPRDEUAFFVASDVMLVTP 

LRDGMNLVAKEYVACRSDLGGALVLSEFTGAAAELRHAYLVNPHDLEGVKDGIEEALIMQTEEAGRR 

RMRSLRRQVLAHDVDRWAQSFLDALAGAHPRGQG 

>Rv3598c lysS lysyl-tRNA synthase TB.seq 4041423:4042937 MW:55678 SEQ ID N0.275 

VSAADTAEDLPEQFRIRRDKRARLLAQGRDPYPVAVPRTHTLAEVRAAHPDLPIDTATEDIVGVAGRV 

IFARNSGKLCFATLQDGDGTQLQVMISLDKVGQAALDAWKADVDLGDIVYVHGAVISSRRGELSVLA 

DCWRIAAKSLRPLPVAHKEMSEESRVRQRYVDUVRPEARAVARLRIAWRAIRTALQRRGFLEVETP 

VLQTLAGGAAARPFATHSNALDIDLYLRIAPELFLKRCIVGGFDKVFELNRVFRNEGADSTHSPEFSM 

LETYQTYGTYDDSAWTRELIQEVADEAIGTRQLPLPDGSVYDIDGEWATIQMYPSLSVALGEEITPQT 

TVDRLRGIADSLGLEKDPAIHDNRGFGHGKLIEELWERTVGKSLSAPTFVKDFPVQTTPLTRQHRSIP 

GVTEKWDLYLRGIELATGYSELSDPWQRERFADQARAAAAGDDEAMVLDEDFLAALEYGMPPCTG 

TGMGIDRLLMSLTGLSIRETVLFPIVRPHSN 

>Rv3600c - similar to Bacillus subtilis protein YacB TB.seq 4043041 :4043856 MW:29274 
SEQ ID NO:276 

VLU^IDVRNTHTWGLLSGMKEHAKWQQWRIRTESEVTADELALTIDGLIGEDSERLTGTAALSTVPS 
VLHEVRIMLDQYWPSVPHVLIEPGVRTGIPLLVDNPKEVGADRIVNCLAAYDRFRKAAIWDFGSSICV 
DWSAKGEFLGGAIAPGVQVSSDAAAARSAALRRVELARPRSWGKNTVECMQAGAVFGFAGLVDG 
LVGRIREDVSGFSVDHDVAIVATGHTAPLLLPELHTVDHYDQHLTLQGLRLVFERNLEVQRGRLKTAR 
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>Rv3606c folK 7,8-cllhydro-6-hydroxymethylpterin pyrophosphokinase TB.seq 
4048181:4048744 MW20732 SEQ ID NO:277 

MTRWLSVGSNLGDRLARLRSVADGLGDALIAASPIYEADPWGGVEQGQFLNAVLIADDPTCEPREW 

LRRAQEFERAAGRVRGQRWGPRNLDVDLIACYQTSATEALVEVTARENHLTLPHPLAHLRAFVLIPW 

IAVDPTAQL7VAGCPRPVTRLLAELEPADRDSVRLFRPSFDLNSRHPVSRAPES 

>Rv3607c foIX may be involved in folate biosynthesis TB.seq 4048744:4049142 MW:14553 

MADRIELRGLWHGRHGVYDHERVAGQRF^IDVnWIDLAEAANSDDLADTYDYVRLASRAAEIVAG 

PPRKLIETVGAEIADHVMDDQRVHAVEVAVHKPQAPIPQTFDDVAWIRRSRRGGRGVVWPAGGAV 

>Rv3608c folP dihydropteroate synthase TB.seq 4049138:4049977 MW:28812 SEQ ID NO:278 

VSPAPVQVMGVLNVTDDSFSDGGCYLDLDDAVKHGLAMAAAGAGIVDVGGESSRPGATRVDPAVE 

TSRVIPWKELMQGIWSIDTMRADVARAALQNGAQMVNDVSGGRADPAMGPLIJVEADVPWVLMH 

WRAVSADTPHVPVRYGNWAEVRADLLASVADAVAAGVDPARLVLDPGLGFAKTAQHNWAILHALP 

ELVATGIPVLVGASRKRFLGALl^GPDGVMRPTDGRDTATAVISAl^AUHGAWGVRVHDVRASVDAI 

KWEAWMGAERIERDG 

>Rv3609c IblE GTP cyclohydrolase I TB.seq 4049977:4050582 MW:22395 SEQ ID NO:279 

MSQLDSRSASARIRWDQQRAEAAVRELLYAIGEDPDRDGLVATPSRVARSYREMFAGLYTDPDSVL 

NTMFDEDHDELVLVKEIPMYSTCEHHLVAFHGVAHVGYIPGDDGRVTGLSKIARLVDLYAKRPQVQE 

RLTSQIADALMKKLDPRGVIWIEAEHLCMAMRGVRKPGSVTTTSAVRGLFKTNAASRAEALDLILRK 

>Rv3610c ftsH inner membrane protein, chaperone TB.seq 4050601 :4052880 MW:81987 

MNRKNWRTITAIAWVLLGWSFFYFSDDTRGYKPVDTSVAITQINGDNVKSAQIDDREQQLRLILKKG 

NNETDGSEKVITKYPTGYAVDLFNALSAKNAKVSTWNQGSILGELL\rA/LPLLLLVGLFVMFSRMQG 

GARMGFGFGKSRAKQLSKDMPKTTFADVAGVDEAVEELYEIKDFLQNPSRYQALGAKIPKGVLLYGP 

PGTGKTLLARAVAGEAGVPFFT1SGSDFVEMFVGVGASRVRDLFEQAKQNSPCIIFVDEIDAVGRQR 

GAGLGGGHDEREQTLNQLLVEMDGFGDRAGVILIAATNRPDILDPALLRPGRFDRQIPVSNPDLAGR 

RAVLRVHSKGKPMAADADLDGLAKRWGMTGADLANVINEAALLTARENGTVITGPALEEAVDRVIG 

GPRRKGRJISEQEKKITAYHEGGHTLAAWAMPDIEPIYKVTILARGRTGGHAVAVPEEDKGLRTRSEMI 

AQLVFAMGGRAAEELVFREPTTGAVSDIEQATKIARSMVTEFGMSSKLGAVKYGSEHGDPFLGRTM 

GTQPDYSHEVAREIDEEVRKLIEAAHTEAWEILTEYRDVLDTLAGELLEKETLHRPELESIFADVEKRP 

RLTMFDDFGGRIPSDKPPIKTPGELAIERGEPWPQPVPEPAFKAAIAQATQAAEAARSDAGQTGHGA 

NGSPAGTHRSGDRQYGSTQPDYGAPAGWHAPGWPPRSSHRPSYSGEPAPTYPGQPYPTGQADP 

GSDESSAEQDDEVSRTKPAHG 

>Rv3671c ■ TB.seq 4112322:4113512 MW:40722 SEQ ID NO:280 

WTPSQWLDIAVUVVAFIAAISGWRAGALGSMLSFGGVLLGATAGVLLAPHIVSQISAPRAKLFAALFLIL 
ALVAA/GEVAGWLGRAVRGAIRNRPIRLIDSVIGVGVQLWVLTMWLLAMPLTQSKEQPELAAAVKG 
SRVURVNEAAPTWLKTVPKRLSALLNTSGLPAVLEPFSRTPVIPVASPDPALVNNPWAATEPSWKI 
RSLAPRCQIWLEGTGFVISPDRVMTNAHWAGSNNVTVYAGDKPFEATWSYDPSVDVAILAVPHLP 
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PPPLVFMEPAKTGADVWLGYPGGQNn"ATPARIREAIRLSGPDIYGDPEP\n*RDVYTI^DVEQGD 

SGGPLIDLNGQVLGWFGAAIDDAETGFVLTAGEVAGQLAKIGATQPVGTGACVS 

>Rv3682 ponA2TB.seq 4121913:4124342 MW:84637 SEQIDNO:281 

MPERLPAAIWLKLAGCCLLASWATALTFPFAGGLGLMSNRASEWANGSAQLLEGQVPAVSTMVD 

AKGNTIAWLYSQRRFEVPSDKIANTMKLAIVSIEDKRFADHSGVDWKGTLTGLAGYASGDLDTRGGS 

TLEQQWKNYQLLVTAQTDAEKRAAVETTPARKLREIRMALTLDKTFTKSEILTRYLNLVSFGNNSFG 

VQDMQTYFGINASDLNWQCWU„LAGMVQSTSTLNPYTNPDGALARRNVVLDTMIENLPGEAEALR 

AAKAEPLGVLPQPNELPRGCIAAGDRAFFCDYVQEYLSRAGISKEQVATGGYLIRTTLDPEVQAPVKA 

AIDKYASPNLAGISSVMSVIKPGKDAHKVLAMASNRKYGLDLEAGETMRPQPFSLVGDGAG^ 

TAAALDMGMGINAQLDVPPRFCMVKGLGSGGAKGCPKETWCWNAGNYRGSMNVTDALATSPNTAF 

AKLISQVGVGRAVDMAIKLGLRSYANPGTARDYNPDSNESLADFVKRQNLGSFTLGPIELNALELSNV 

AATUVSGGVWCPPNPIDQLIDRNGNEVAVTTCT 

AGWDLPMSGKTGTTEAHRSAGFVGFTNRYAAANYIYDDSSSPTDLCSGPLRHCGSGDLYGGNEPS 
RTWFAAMKPIANNFGEVQLPPTDPRYVDGAPGSRVPSVAGLDVDAARQRLKDAGFQVADQTNSVN 
SSAKYGEWGTSPSGQTIPGSIVTIQISNGIPPAPPPPPLPEDGGPPPPVGSQWEIPGLPPITIPLU^ 
PPPAPPP 

>Rv3721c dnaZX DNA polymerase lll,[gamma] (dnaZ) and t (dnaX) TB.seq 4164995:4166728 
MW:61892 SEQ ID NO:282 

VALYRKYRPASFAEWGQEHVTAPLSVALDAGRINHAYLFSGPRGCGKTSSARILARSLNCAQGPTA 
NPCGVCESCVSLAPNAPGSIDWELDAASHGGVDDTRELRDRAFYAPVQSRYRVFIVDEAHMVTTA 
GFNALLKIVEEPPEHLIFIFATTEPEKVLPTIRSRTHHYPFRLLPPRTMRALLARICEQEGVWDDAVYP 
LVIRAGGGSPRDTL3VLDQLLAGAADTH\nYTRALG^ 

DGGHDPRRFATDLLERFRDLIVLQSVPDAASRGWDAPEDALDRMREQAARIGRATLTRYAEWQA 

GLGEMRGATAPRLLLEWCARLLLPSASDAESALLQRVERIETRLDMSIPAPQAVPRPSAAAAEPKHQ 

PAREPRPVLAPTPASSEPWAAVRSMWPWRDKVRLRSRTTEVM 

ARRLSEQRNADVLAEALKDALGVNWRVRCETGEPAAAASPVGGGANVATAKAVNPAPTANSTQRD 

EEEHMLAEAGRGDPSPRRDPEEVALELLQNELGARRIDNA 

>Rv3783 - TB.seq 4229255:4230094 MW.32337 SEQIDNO:283 

MTFMDAQASFQTQSRTIJkRVRGDLVDGFRRHELWLHLGWQDIKQRYRRSVLGPFWITlATGTTAVA 

MGGLYSKLFRLELSEHLPYVTLGLIVWNLINAAILDGAEVFVANEGLIKQLPAPLSVHW 

FAHNIVIYFVIAIIFPKPWSWADLSFLPALALIFLNCN^ 

WNDETLRRQGAGRWSSIVELNPLLHYLDIVRAPLLGAHQEL^ 

ARVPYWV 

>Rv3789 - TB.seq 4235371:4235733 MW:13378 SEQIDNO:284 

MRFNA/TGGLAGIVDFGLYWLYKVAGLQVDLSKAISFIVGTITAYLINRRmFQAEPSTARFVAV 

GITFAVQVGLNHLCLALLHYRAWAIPVAFVIAQGTATVINFIVQRAVIFRIR 

>Rv3790 -TB.seq 4235776:4237158 MW:50164 SEQ ID NO:285 
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MLSVGATTTATRLTQWGRTAPSVANVLRTPDAEMIVKAVARVAESG6GRGAIARGLGRSY6DNAQN 
GGGLVIDMTPLNTIHSIDADTKLVDIDAGVNLDQLMKAALPFGLWVPVLPGTRQVTVGGAIACDIHGK 
NHHSAGSFGNHVRSMDLLTADGEIRHLTPTGEDAELFWATVGGNGLTGMMRATIEMTPTSTAYFIAD 
GDWASLDETIALHSDGSEARYTYSSAWFDAISAPPKLGRAAVSRGRLATVEQLPAKLRSEPLKFDAP 
5 QLLTLPDVFPNGLANKYTFGPIGELWYRKSGTYRGKVQNLTQFYHPLDMFGEWNRAYGPAGFLQYQ 
FVIPTEAVDEFKKIIGVIQASGHYSFLNVFKLFGPRNQAPLSFPIPGWNICVDFPIKDGLGKFVSELDRR 
VLEFGGRLYTAKDSRTTAETFHAMYPRVDEWISVRRKVDPLRVFASDMARRLELL 
>Rv3791 -TB.seq 4237162:4237923 MW:27470 SEQIDNO:286 

MVLDAVGNPQTVLLLGGTSEIGLAICERYLHNSAARIVLACLPDDPRREDAAAAMKQAGARSVELIDF 
10 DALDTDSHPKMIEAAFSGGDVDVAIVAFGLLGDAEELWQNQRKAVQIAEINYTAAVSVGVLLAEKMR 
AQGFGQIIAMSSAAGERVRRANFVYGSTKAGLDGFYLGLSEALREYGVRVLVIRPGQVRTRMSAHLK 
EAPLTVDKEYVANLAVTASAKGKELVWAPAAFRYVMMVLRHIPRSIFRKLPI 
>Rv3794 embA TB.seq 4243230:424651 1 MW:1 15694 SEQ ID NO:287 

VPHDGNERSHRIARLAAWSGIAGLLLCGIVPLLPVNQTTATIFWPQGSTADGNITQITAPLVSGAPRA 

1 5 LDISIPCSAIATLPANGGLVLSTLPAGGVDTGKAGLFVRANQDTWVAFRDSVAAVAARSTIAAGGCS 
ALHIWADTGGAGADFMGIPGGAGTLPPEKKPQVGGIFTDLKVGAQPGLSARVDIDTRFITTPGALKKA 
VMLLGVLAVLVAMVGUVALDRLSRGRTLRDVVLTRYRPRVRVGFASRLADMVIATLLLWHVIGATSS 
DDGYLL7VARVAPKAGWANYYRYFGTTEAPFDWYTSVLAQLAAVSTAGVWMRLPATLAGIACWLIV 
SRFVLRRLGPGPGGLASNRVAVFTAGAVFLSAVyn_PFNNGLRPEPLIALGVLVTVVVLVERSIALGRLAP 

20 AAVAIIVATLTATLAPQGLIALAPLLTGARAIAQRIRRRRATDGLLAPLAVLAAALSLITVWFRDQTLATV 
AESARIKYKVGPTIAVVYQDFLRYYFLTVESNVEGSMSRRFAVLVLLFCLFGVLFVLLRRGRVAGLASG 
PAWRLIGTTAVGLLLLTFTPTKWAVQFGAFAGLAGVLGAVTAFTFARIGLHSRRNLTLYVTALLFVLA 
WATSGINGWFWGNYGVPWYDIQPVIASHPV^SMFLTLSILTGLIAAVVYHFRMDYAGHTEVKDNRR 
NRILASTPLLWAVIMVAGEVGSMAKAAVFRYPLYTTAKANLTALSTGLSSCAMADDVLAEPDPNAGM 

25 LQPVPGQAFGPDGPLGGISPVGFKPEGVGEDLKSDPWSKPGLVNSDASPNKPNAAITDSAGTAGG 
KGPVGINGSHAALPFGLDPARTPVMGSYGENNLAATATSAWYQLPPRSPDRPLNAA/SAAGAIWSYK 
EDGDFIYGQSLKLQWGVTGPDGRIQPLGQVFPIDIGPQPAWRNLRFPLAWAPPEADVARIVAYDPNL 
SPEQWFAFTPPRVPVLESLQRUGSATPVLMDIATAANFPCQRPFSEHLGIAELPQYRILPDHKQTAA 
SSNLWQSSSTGGPFLFTQALLRTSTIATYLRGDWYRDWGSVEQYHRLVPADQAPDAWEEGVITVP 

30 GWGRPGPI RALP 

>Rv3795 embB TB.seq 4246511:4249804 MW:118023 SEQ ID NO:288 

MTOCASRRKSTPNRAILGAFASARGTRWVATIAGLIGFVLSVATPLLPWQTTAMLDWPQRGQLGSV 
TAPUSLTPWFTATVPCDWRAMPPAGGWLGTAPKQGKDANLQALF\AA/SAQRVDVTDRNW|LS 
VPREQVTSPQCQRIEVTSTHAGTFANFVGLKDPSGAPLRSGFPDPNLRPQIVGVFTDLTGPAPPGLA 
35 VSATIDTRFSTRPTTLKLLAIIGAIVATWALIALWRLDQLDGRGSIAQLLLRPFRPASSPGGMRRLIPAS 
WRTFTLTDAWIFGFLLWHVIGANSSDDGYILGMARVADHAGYMSNYFRWFGSPEDPFGVVYYNLLA 
LMTHVSDASLWMRLPDLAAGLVCWLLLSREVLPRLGPAVEASKPAYWAAAMVLLTAWMPFNNGLR 
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PEGIIALGSLVTYVLIERSMRYSRLTPAALAVVTAAFTLGVQPTGLIAVAALVAGGRPMLRILVRRHRLV 

GTLPLVSPMU^AGWILTWFADQTLSTVLEATRVRAKIGPSC3AWYTENLRYYYLILPTVDGSLSRRFG 

FLITALCLFTAVFIMLRRKRIPSVARGPAWRLMGVIFGTMFFLMFTPTKWVHHFGLFAAVGAAMAALT 

mVSPSVLRWSRNRI^FUW.FFUJU.CWATTNGVVWYVSSYGVPFNSAMPKIDGITVSTIFFALFAI 

MGYAAWLHFAPRGAGEGRLIR^TTAPVPIVAGFMAAVFVASMVAGIVRQYPTYSNGWSNVRAFV 

GGCGLADDVLVEPDTNAGFMKPLDGDSGSWGPLGPLGGVNPVGFTPNGVPEHTVAEAIVMKPNQP 

GTDYDWDAPTKLTSPGINGSTVPLPYGLDPARVPLAGTYTTGAQQQSTLVSAWYLLPKPDDGHPLV 

WTAAGKIAGNSVLHGYTPGQTWLEYAMPGPGALVPAGRMVPDDLYGEQPKAWRNLRFARAKMP 

ADAVAVRWAEDLSLTPEDWIAVTPPRVPDLRSLQEYVGSTQPVLLDWAVGLAFPCQQPMLHANGIA 

EIPKFRITPDYSAKKLDTDTWEDGTNGGLLGITDLLLRAHVMATYLSRDWARDWGSLRKFDTLVDAP 

PAQLELGTATRSGLWSPGKIRIGP 

>Rv3834c serS seryMRNA synthase TB.seq 4307655:430891 1 MW:45293 SEQ ID NO:289 

VIDLKLLRENPDAVRRSQLSRGEDPALVDALLTADAARRAVISTADSLRAEQKAASKSVGGASPEERP 

PLLRRAKEUVEQVKMEADEVEAEAAFTAAHUVISNVIVDGVPAGGEDDYAVLDWGEPSYLENPKD 

HLELGESLGLIDMQRGAKVSGSRFYFLTGRGALLQLGLLQLALJ<LAVDNGFVPTIPPVLVRPEVMVGT 

GFLGAHAEEVYRVEGDGLYLVGTSEVPLAGYHSGEILDUSRGPLRYAGWSSCFRREAGSHGKDTRG 

IIRVHQFDKVEGFWCTPADAEHEHERLLGWQRQMLARIEVPYRV1DVAAGDLGSSAARKFDCEAWI 

PTQGAYRELTSTSNCTTFQARRLATRYRDASGKPQIAATLNGTIATTRVVLVAILENHQRPDGSVRVP 

DALVPFVGVEVLEPVA 

>Rv3907c pcnA polynucleotide polymerase TB.seq 4391 631 :4393070 MW:53057 SEQ ID NO:290 

VPEAVQEADLLTAAAVALNRHAALLRELGSVFAAAGHELYLVGGSVRDALLGRLSPDLDFTTDARPE 

RVQEIVRPWADAVWDTGIEFGTVGVGKSDHRMEITTFRADSYDRVSRHPEVRFGDCLEGDLVRRDF 

TTNAMAVRVTATGPGEFLDPLGGLAALRAKVLDTPAAPSGSFGDDPLRMLRAARFVSQLGFAVAPR 

VRMIEEMAPQLARISAERVAAELDKLLVGEDPAAGIDLMVQSGMGAWLPEIGGMRMAIDEHHQHK 

DWQHSLTVLRQAIALEDDGPDLVLRWAALLHDIGKPATRRHEPDGGVSFHHHEWGAKMVRKRMR 

ALKYSKQMIDDISQLWLHLRFHGYGDGKWTDSAVRRYVTDAGALLPRLHKLVRADCTTRNKRRAAR 

LQASYDRLEERIAELAAQEDLDRVRPDLDGNQIMAVLDIPAGPQVGEAWRYLKELRLERGPLSTEEA 

TTELLSWWKSRGNR 

A number of embodiments of the invention have been described. Neverthe- 
less, it will be understood that various modifications may be made without departing from 
the spirit and scope of the invention. Accordingly, other embodiments are within the scope 
of the following claims. 
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WHAT IS CLAIMED IS: 

1 . A method for identifying a nucleic acid or a polypeptide sequence that 
may be a target for a drug comprising the following steps: 

(a) providing a first nucleic acid or a polypeptide sequence that is known to 
5 be a drug target; 

(b) providing at least one algorithm selected from the group consisting of a "domain 
fusion" method, a "phylogenetic profile" method and a "physiologic linkage" method, 
wherein the algorithm is capable analyzing a functional relationship between nucleic acid or 
polypeptide sequences; and 
10 (c) comparing the first nucleic acid or the polypeptide drug target sequence to a 

plurality of sequences using at least one of the algorithms as set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence, thereby identifying a 
nucleic acid or a polypeptide sequence that may be a target for a drug , 

15 2. A method for identifying a nucleic acid or a polypeptide sequence that 

may be essential for the growth or viability of an organism comprising the following steps: 
(a) providing a first nucleic acid or a polypeptide sequence that is laiown to 
be essential for the growth or viability of an organism; 

(b) providing at least one algorithm capable analyzing a functional relationship 
20 between nucleic acid or polypeptide sequences selected from the group consisting of a 

"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 

(c) comparing the first nucleic acid or the polypeptide sequence to a plurality of 
sequences using at least one of the algorithms as set forth in step (b) to identify a second 

25 sequence that has a functional relationship to the first sequence, thereby identifying a nucleic 
acid or a polypeptide sequence that may be essential for the growth or viability of an 
organism. 

3. The method of claim 1 or claim 2, wherein the drug is an anti- 
30 microbial drug. 
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4. The method of claim 1 or claim 2, wherein the first nucleic acid or a 
polypeptide sequence is derived from a pathogen* 



The method of claim 4, wherein the pathogen is a microorganism. 



6 



The method of claim 1 



or claim 2, wherein the microorganism is 



Mycobacterium tuberculosis (MTB). 



7. 



The method of claim 1 



or claim 2, wherein the plurality of sequences 
a database of the gene sequences of an entire 



10 



used to identify a second sequence comprises 



genome of an organism. 

8. The method of claim 1 or claim 2, wherein the plurality of sequences 
used to identify a second sequence comprises a database of the gene sequences derived from 

15 a pathogen. 

9. The method of claim 1 or claim 2, wherein the u phylogenetic profile" 
method algorithm comprises 

(a) obtaining data, comprising a list of proteins from at least two genomes; 
20 (b) comparing the list of proteins to form a protein phylogenetic profile for 

each protein, wherein the protein phylogenetic profile indicates the presence or absence of a 
protein belonging to a particular protein family in each of the at least two genomes based on 
homology of the proteins; and 



25 



(c) grouping the list of proteins based on similar profiles, wherein proteins 
with similar profiles are indicated to have a functional relationship, 



1 0. The method of claim 9, wherein the phylogenetic profile is in the form 
of a vector, matrix or phylogenetic tree. 



30 



1 1 . The method of claim 9, comprising determining the significance of 
homology between the proteins by computing a probability (p) value threshold. 
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12. The method of claim 1 1 , wherein the probability is set with respect to 
the value l/NM, based on the total number of sequence comparisons that are to be 
performed, wherein N is the number of proteins in the first organism's genome and M in all 
other genomes. 

13. The method of claim 9, wherein the presence or absence is by 
calculating an evolutionary distance. 

14. The method of claim 13, wherein the evolutionary distance is 

calculated by: 

(a) aligning two sequences from the list of proteins; 

(b) determining an evolution probability process by constructing a conditional 
probability matrix: p(aa— »aa') 5 where aa and aa 1 are any amino acids, said conditional 
probability matrix being constructed by converting an amino acid substitution matrix from a 
log odds matrix to said conditional probability matrix; 

(c) accounting for an observed alignment of the constructed conditional 
probability matrix by taking the product of the conditional probabilities for each aligned pair 
during the alignment of the two sequences, represented by ify)^ J"[ p{aa» adn) ; and 

(d) determining an evolutionary distance a from powers equation 
p^ a (aa— ►aa'), maximizing for P. 

1 5. The method of claim 14, wherein the conditional probability matrix is 
defined by a Markov process with substitution rates, over a fixed time interval. 

16. The method of claim 14, where the conversion from an amino acid 
substitution matrix to a conditional probability matrix is represented by: 

BLOSUM62ij 
P# ->/) = J— > 
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where BLOSUM62 is an amino acid substitution matrix, and P(i->j) is the 
probability that amino acid i is replaced by amino acid j through point mutations according to 
BLOSUM62 scores. 

17. The method of claim 1 6, where Pfs are the abundances of amino acid 
5 j and are computed by solving a plurality of linear equations given by the normalization 
condition that: 

2><'->/)-i- 

1 S. The method of claim 1 or claim 2, wherein the "physiologic linkage" 
10 method algorithm identifies proteins and nucleic acids that participate in a common 
functional pathway. 

19. The method of claim 1 or claim 2, wherein the physiologic linkage" 
method algorithm comprises identifies proteins and nucleic acids that participate in the 
15 synthesis of a common structural complex. 

20* The method of claim 1 or claim 2, wherein the "physiologic linkage" 
method algorithm comprises identifies proteins and nucleic acids that participate in a 
common metabolic pathway. 

20 

2 1 . The method of claim 1 or claim 2, wherein the "domain fusion" 
method algorithm comprises 

(a) aligning a first primary amino acid sequence of multiple distinct non-homologous 
polypeptides to second primary amino acid sequence of a plurality of proteins; and 
25 (b) for any alignment found between the first primary amino acid sequences of all of 

such multiple distinct non-homologous polypeptides and at least one protein of the second 
primary amino acid sequences, outputting an indication identifying the aligned second 
primary amino acid sequence as an indication of a functional link between the aligned first 
and second polypeptide sequences. 

30 
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22. The method of claim 2 1 , wherein the aligning is performed by an 
algorithm selected from the group consisting of a Smith-Waterman algorithm, Needleman- 
Wunsch algorithm, a BLAST algorithm, a FASTA algorithm, and a PSI-BLAST algorithm. 

5 23. The method of claim 21, wherein the multiple distinct non- 

homologous polypeptides are obtained by translating a nucleic acid sequence from a genome 
database. 

24. The method of claim 21, wherein the plurality of proteins have a 
10 known function. 

25. The method of claim 21 , wherein at least one of the multiple distinct 
non-homologous polypeptides has a known function. 

15 26. The method of claim 21, wherein at least one of the multiple distinct 

non-homologous polypeptides has an unknown function. 

27. The method of claim 2 1 , wherein the alignment is based on the degree 
of homology of the multiple distinct non-homologous polypeptides to the plurality of 

20 proteins. 

28. The method of claim 2 1 , further comprising determining the 
significance of the aligned and identified second primary amino acid sequence by computing 
a probability (p) value threshold. 

25 

29. The method of claim 28, wherein the probability threshold is set with 
respect to the value 1/NM, based on the total number of sequence comparisons that are to be 
performed, wherein N is the number of proteins in a first organism's genome and M in all 
other genomes. 

30 

30. The method of claim 21 , further comprising filtering excessive 

functional links between one first primary amino acid sequence of multiple distinct non- 
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homologous polypeptides and an excessive number of other distinct non-homologous 
polypeptides for any alignment found between the first primaiy amino acid sequences of the 
distinct non-homologous polypeptides and at least one of the second primary amino acid 
sequences of the plurality of proteins. 

31. A computer program product, stored on a computer-readable medium, 
for identifying a nucleic acid or a polypeptide sequence that may be a target for a drug, the 
computer program product comprising instructions for causing a computer system to be 
capable of: 

(a) inputting a first nucleic acid or a polypeptide sequence that is known to be 

a drug target; 

(b) accessing at least one algorithm capable analyzing a functional relationship 
between nucleic acid or polypeptide sequences selected from the group consisting of a 
"domain fusion** method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 

(c) comparing the first nucleic acid or the polypeptide drug target sequence to 
a plurality of sequences using at least one of the algorithms set forth in step (b) to identify a 
second sequence that has a functional relationship to the first sequence and generating an 
output identifying a nucleic acid or a polypeptide sequence that may be a target for a drug . 

32. A computer program product, stored on a computer-readable medium, 
for identifying a nucleic acid or a polypeptide sequence that may be essential for the growth 
or viability of an organism, the computer program product comprising instructions for 
causing a computer system to be capable of: 

(a) providing a first nucleic acid or a polypeptide sequence that is known to 
be essential for the growth or viability of an organism; 

(b) accessing at least one algorithm capable analyzing a functional relationship 
between nucleic acid or polypeptide sequences selected from the group consisting of a 
"domain fusion" method, a "phylogenetic profile" method and a "physiologic linkage" 
method; and 
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(c) comparing the first nucleic acid or the polypeptide sequence to a plurality of 
sequences using at least one of the algorithms set forth in step (b) to identify a second 
sequence that has a functional relationship to the first sequence and generating an output 
identifying a nucleic acid or a polypeptide sequence that may be essential for the growth or 
viability of an organism. 

33 . A computer system, comprising: 

(a) a processor; and 

(b) a computer program product as set forth in claim 3 1 or claim 32. 
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Figure 1 
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Figure 2 
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Figure 3 
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Figure 4 
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Figure 5 
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